Replacing a specific column with mean values

Question

I am trying to replace the user_score with the average user_score for the game's platform and genre. This is my code:

dft = new_df.query('user_score != "tbd" & user_score.isnull()')
df_typical_user_ratio_by_platform = dft.groupby(['platform', 'genre'])['user_score'].apply(lambda x: x.sample(1).iloc[0])

def correct_user_score(row):
    platform = row['platform']
    genre = row['genre']
    if (row['user_score'] == 'tbd' or pd.isnull(row['user_score']) or row['user_score']=='nan'):
        u = df_typical_user_ratio_by_platform.loc[[platform, genre]].head(1).astype('float')
        uScore = ", ".join(map(str, u)) 
    else:
        uScore = row['user_score']
        
    return uScore

row = pd.Series(data=row_values, index=['user_score', 'platform', 'genre'])
correct_user_score(row)
new_df['user_score'] = new_df.apply(correct_user_score, axis=1)
new_df.sample(40)
# df['user_score'] = df['user_score'].astype('int')

This is the result. user_score is currently an object. I'm not sure how to replace nan. I tried doing if u = 'nan', but that didn't work. Any advice?

https://i.stack.imgur.com/g7AU4.jpg

Here are some ways to replace nan: https://www.geeksforgeeks.org/replace-nan-values-with-zeros-in-pandas-dataframe/ — LevB, Feb 20 '21 at 05:11
Your image shows "NaN", which is of course not equal to "nan". Are you actually getting the string "NaN", or are you getting the floating point value NaN? Those are also two different things. — Tim Roberts, Feb 20 '21 at 06:50
Try [this](https://stackoverflow.com/a/60203797/11380795) solution — RJ Adriaansen, Feb 20 '21 at 06:51
sample data and sample output, the whole approach looks more complex than needed — Rob Raymond, Feb 20 '21 at 07:13
Hi so I'm trying to fix the 'user_score' column only right now and it does have object 'nan' in it which is different from 'NaN'. @TimRoberts — Libby, Feb 20 '21 at 08:16

score 0 · Answer 1 · answered Feb 20 '21 at 07:44

force invalid values to NaN with to_numerice()
fillna() with calculation you want

s = 20
df = pd.DataFrame({"userid":np.random.randint(1,5,s),
             "platform":np.random.choice(["windows","macos","ios","android"],s),
             "userscore":np.random.randint(1,10,s)})

# let's splat some scores...
df = df.assign(userscore=np.select([(df.userscore==7)&(df.index<10),(df.userscore==6)&(df.index<10)],["tbd",np.nan],df.userscore))

df["bad"] = df.userscore
df = df.assign(userscore=pd.to_numeric(df.userscore, errors="coerce"))
df.userscore = df.userscore.fillna(df.groupby(["userid","platform"])["userscore"].transform("mean"))

output

	userid	platform	userscore	bad
0	3	ios	8	8
1	3	ios	5	5
2	1	macos	4.5	tbd
3	2	macos	3	3
4	2	android	3	3
5	2	ios	4	4
6	1	macos	5	5
7	4	android	8	nan
8	1	macos	4	4
9	2	windows	2	2
10	2	android	1	1
11	4	windows	5	5
12	3	android	2	2
13	2	windows	9	9
14	3	android	8	8
15	2	windows	1	1
16	4	windows	8	8
17	2	windows	4	4
18	2	ios	3	3
19	4	android	8	8

Replacing a specific column with mean values

1 Answers1

output