3

I'm trying to write fillna() or a lambda function in Pandas that checks if 'user_score' column is a NaN and if so, uses column's data from another DataFrame. I tried two options:

games_data['user_score'].fillna(
    genre_score[games_data['genre']]['user_score']
    if np.isnan(games_data['user_score'])
    else games_data['user_score'],
    inplace = True
)

# but here is 'ValueError: The truth value of a Series is ambiguous'

and

games_data['user_score'] = games_data.apply(
    lambda row: 
    genre_score[row['genre']]['user_score'] 
    if np.isnan(row['user_score'])
    else row['user_score'],
    axis=1
)

# but here is 'KeyError' with another column from games_data

My dataframes:

games_data

enter image description here

genre_score

enter image description here

I will be glad for any help!

MaxB
  • 635
  • 1
  • 9
  • 22

3 Answers3

2

You can also fillna() directly with the user_score_by_genre mapping:

user_score_by_genre = games_data.genre.map(genre_score.user_score)
games_data.user_score = games_data.user_score.fillna(user_score_by_genre)

BTW if games_data.user_score will never deviate from the genre_score values, you can skip the fillna() and just assign directly to games_data.user_score:

games_data.user_score = games_data.genre.map(genre_score.user_score)

Pandas' built-in Series.where also works and is a bit more concise:

df1.user_score.where(df1.user_score.isna(), df2.user_score, inplace=True)

tdy
  • 36,675
  • 19
  • 86
  • 83
  • 1
    thanks for your answer, but this is not exactly what I need. Please check my comment in Mayank Porwal's answer – MaxB Mar 30 '21 at 10:41
  • @MaxB Looking at it closer, I'm not clear on what the dataframes look like, especially `genre_score`. Could you post some sample rows from both `games_data` and `genre_score`? – tdy Mar 30 '21 at 23:25
  • 1
    Just did it, you can check the question. Also I get a solution, will post it soon – MaxB Mar 30 '21 at 23:47
  • 1
    @MaxB Looks like you got a working solution already, but just for reference I added another option of using `fillna(user_score_by_genre)` directly. – tdy Mar 31 '21 at 00:19
  • @MaxB No problem. BTW is it technically possible for `games_data.user_score` to deviate from the `genre_score` values? If not, you can actually skip the `fillna()` and just assign the mapping directly: `games_data.user_score = games_data.genre.map(genre_score.user_score)` – tdy Mar 31 '21 at 00:52
  • 1
    I think it's not a good idea, because I fill in the gaps in games_data with the average score for each genre separately from genre_score. And this approach will blur the data. – MaxB Mar 31 '21 at 12:53
1

Use numpy.where:

import numpy as np

df1['user_score'] = np.where(df1['user_score'].isna(), df2['user_score'], df1['user_score'])
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • thanks, but this is not exactly what I need. df1 and df2 they have different sizes, so I have to select a specific value by key from df2 for every NaN value from df1, as indicated in my example `df2[ df1['genre'] ]['user_score']`. As. result it can be something like `df1['user_score'] = np.where(df1['user_score'].isna(), df2[df1['genre']]['user_score'], df1['user_score'])` – MaxB Mar 30 '21 at 10:38
1

I found the part of the solution here

I use series.map:

user_score_by_genre = games_data['genre'].map(genre_score['user_score'])

And after that I use @MayankPorwal answer:

games_data['user_score'] = np.where(games_data['user_score'].isna(), user_score_by_genre, games_data['user_score'])

I'm not sure that it is the best way but it works for me.

MaxB
  • 635
  • 1
  • 9
  • 22