0

I'v been struggling at replacing NaN values with the mode in my code. Pandas Series.median() seem to work just fine but if I try Series.mode(), the code runs without any errors but doesn't do anything (aka nan values are still there).

So I tried this instead:

def findmode(series):
    nums = list(series)
    nums.sort()
    counts = dict()
    for i in nums:
        counts[i] = counts.get(i, 0) + 1
    mode = max(counts, key=counts.get)
    return mode

# Converting "pickup_datetime" to pandas datetime.

x_train['pickup_datetime'] =  pd.to_datetime(x_train['pickup_datetime'], format='%Y-%m-%d %H:%M:%S %Z').replace('nan', pd.NA)

# Creating day_of_week column and replacing NaN values with mode.

x_train['day_of_week'] = x_train['pickup_datetime'].dt.strftime('%w')
x_train['day_of_week'] = x_train['day_of_week'].fillna(findmode(x_train['day_of_week']))

When I run the above, I'm getting a weird error message regarding my loop:

Input In [21], in findmode(series)
      1 def findmode(series):
      2     nums = list(series)
----> 3     nums.sort()
      4     counts = dict()
      5     for i in nums:

TypeError: '<' not supported between instances of 'float' and 'str'

It's important to note that the loop above successfully extract the mode when feeding it a column that doesn't have NaN. Since I'm trying to replace NaN values it seems like I'm in some sort of catch 22.

Also, there might be an issue with how I put the function in my code, syntax and what not. I'm still fairly beginner and I tried a bunch of things but now I'm definitely stuck. Thanks!

Alex
  • 35
  • 5
  • It would help to show how you tried `Series.mode()` because whatever you tried may be fixable. – sj95126 Nov 23 '22 at 16:38
  • 1
    Answered here https://stackoverflow.com/questions/42789324/how-to-pandas-fillna-with-mode-of-column – jprebys Nov 23 '22 at 16:39

0 Answers0