Use fillna() and lambda function in Pandas to replace NaN values

Question

I'm trying to write fillna() or a lambda function in Pandas that checks if 'user_score' column is a NaN and if so, uses column's data from another DataFrame. I tried two options:

games_data['user_score'].fillna(
    genre_score[games_data['genre']]['user_score']
    if np.isnan(games_data['user_score'])
    else games_data['user_score'],
    inplace = True
)

# but here is 'ValueError: The truth value of a Series is ambiguous'

and

games_data['user_score'] = games_data.apply(
    lambda row: 
    genre_score[row['genre']]['user_score'] 
    if np.isnan(row['user_score'])
    else row['user_score'],
    axis=1
)

# but here is 'KeyError' with another column from games_data

My dataframes:

games_data

genre_score

I will be glad for any help!

tdy · Accepted Answer · 2021-03-31T00:58:15.263

2

You can also fillna() directly with the user_score_by_genre mapping:

user_score_by_genre = games_data.genre.map(genre_score.user_score)
games_data.user_score = games_data.user_score.fillna(user_score_by_genre)

BTW if games_data.user_score will never deviate from the genre_score values, you can skip the fillna() and just assign directly to games_data.user_score:

games_data.user_score = games_data.genre.map(genre_score.user_score)

~~Pandas' built-in Series.where also works and is a bit more concise:~~

~~df1.user_score.where(df1.user_score.isna(), df2.user_score, inplace=True)~~

edited Mar 31 '21 at 00:58

answered Mar 28 '21 at 15:45

tdy

36,675
19
86
83

1

thanks for your answer, but this is not exactly what I need. Please check my comment in Mayank Porwal's answer – MaxB Mar 30 '21 at 10:41
@MaxB Looking at it closer, I'm not clear on what the dataframes look like, especially `genre_score`. Could you post some sample rows from both `games_data` and `genre_score`? – tdy Mar 30 '21 at 23:25
1

Just did it, you can check the question. Also I get a solution, will post it soon – MaxB Mar 30 '21 at 23:47
1

@MaxB Looks like you got a working solution already, but just for reference I added another option of using `fillna(user_score_by_genre)` directly. – tdy Mar 31 '21 at 00:19
@MaxB No problem. BTW is it technically possible for `games_data.user_score` to deviate from the `genre_score` values? If not, you can actually skip the `fillna()` and just assign the mapping directly: `games_data.user_score = games_data.genre.map(genre_score.user_score)` – tdy Mar 31 '21 at 00:52
1

I think it's not a good idea, because I fill in the gaps in games_data with the average score for each genre separately from genre_score. And this approach will blur the data. – MaxB Mar 31 '21 at 12:53

Mayank Porwal · Answer 2 · 2021-03-30T06:53:40.383

1

Use numpy.where:

import numpy as np

df1['user_score'] = np.where(df1['user_score'].isna(), df2['user_score'], df1['user_score'])

edited Mar 30 '21 at 06:53

answered Mar 28 '21 at 15:41

Mayank Porwal

33,470
8
37
58

thanks, but this is not exactly what I need. df1 and df2 they have different sizes, so I have to select a specific value by key from df2 for every NaN value from df1, as indicated in my example `df2[ df1['genre'] ]['user_score']`. As. result it can be something like `df1['user_score'] = np.where(df1['user_score'].isna(), df2[df1['genre']]['user_score'], df1['user_score'])` – MaxB Mar 30 '21 at 10:38

score 1 · Answer 3 · answered Mar 30 '21 at 23:52

I found the part of the solution here

I use series.map:

user_score_by_genre = games_data['genre'].map(genre_score['user_score'])

And after that I use @MayankPorwal answer:

games_data['user_score'] = np.where(games_data['user_score'].isna(), user_score_by_genre, games_data['user_score'])

I'm not sure that it is the best way but it works for me.

Use fillna() and lambda function in Pandas to replace NaN values

3 Answers3