I'v been struggling at replacing NaN values with the mode in my code. Pandas Series.median() seem to work just fine but if I try Series.mode(), the code runs without any errors but doesn't do anything (aka nan values are still there).
So I tried this instead:
def findmode(series):
nums = list(series)
nums.sort()
counts = dict()
for i in nums:
counts[i] = counts.get(i, 0) + 1
mode = max(counts, key=counts.get)
return mode
# Converting "pickup_datetime" to pandas datetime.
x_train['pickup_datetime'] = pd.to_datetime(x_train['pickup_datetime'], format='%Y-%m-%d %H:%M:%S %Z').replace('nan', pd.NA)
# Creating day_of_week column and replacing NaN values with mode.
x_train['day_of_week'] = x_train['pickup_datetime'].dt.strftime('%w')
x_train['day_of_week'] = x_train['day_of_week'].fillna(findmode(x_train['day_of_week']))
When I run the above, I'm getting a weird error message regarding my loop:
Input In [21], in findmode(series)
1 def findmode(series):
2 nums = list(series)
----> 3 nums.sort()
4 counts = dict()
5 for i in nums:
TypeError: '<' not supported between instances of 'float' and 'str'
It's important to note that the loop above successfully extract the mode when feeding it a column that doesn't have NaN. Since I'm trying to replace NaN values it seems like I'm in some sort of catch 22.
Also, there might be an issue with how I put the function in my code, syntax and what not. I'm still fairly beginner and I tried a bunch of things but now I'm definitely stuck. Thanks!