I am trying to replace the user_score with the average user_score for the game's platform and genre. This is my code:
dft = new_df.query('user_score != "tbd" & user_score.isnull()')
df_typical_user_ratio_by_platform = dft.groupby(['platform', 'genre'])['user_score'].apply(lambda x: x.sample(1).iloc[0])
def correct_user_score(row):
platform = row['platform']
genre = row['genre']
if (row['user_score'] == 'tbd' or pd.isnull(row['user_score']) or row['user_score']=='nan'):
u = df_typical_user_ratio_by_platform.loc[[platform, genre]].head(1).astype('float')
uScore = ", ".join(map(str, u))
else:
uScore = row['user_score']
return uScore
row = pd.Series(data=row_values, index=['user_score', 'platform', 'genre'])
correct_user_score(row)
new_df['user_score'] = new_df.apply(correct_user_score, axis=1)
new_df.sample(40)
# df['user_score'] = df['user_score'].astype('int')
This is the result. user_score is currently an object. I'm not sure how to replace nan. I tried doing if u = 'nan', but that didn't work. Any advice?