-1

I have a df with a column, Critic_Score, that has NaN values. I am trying to replace them with the average of the Critic Scores from the same platform. This question has been asked on stack overflow several times and I used 4 suggestions that did not give me the desired output. Please tell me how to fix this.

This is a subset of the df:

x[['Platform','Critic_Score']].head()

Platform    Critic_Score
0   wii 76.0
1   nes NaN
2   wii 82.0
3   wii 80.0
4   gb  NaN

More information on the original df:

x.head().to_dict('list')
{'Name': ['wii sports',
  'super mario bros.',
  'mario kart wii',
  'wii sports resort',
  'pokemon red/pokemon blue'],
 'Platform': ['wii', 'nes', 'wii', 'wii', 'gb'],
 'Year_of_Release': [2006.0, 1985.0, 2008.0, 2009.0, 1996.0],
 'Genre': ['sports', 'platform', 'racing', 'sports', 'role-playing'],
 'NA_sales': [41.36, 29.08, 15.68, 15.61, 11.27],
 'EU_sales': [28.96, 3.58, 12.76, 10.93, 8.89],
 'JP_sales': [3.77, 6.81, 3.79, 3.28, 10.22],
 'Other_sales': [8.45, 0.77, 3.29, 2.95, 1.0],
 'Critic_Score': [76.0, nan, 82.0, 80.0, nan],
 'User_Score': ['8', nan, '8.3', '8', nan],
 'Rating': ['E', nan, 'E', 'E', nan]}

These are the statements I tried followed by their output:

1.

x['Critic_Score'] = x['Critic_Score'].fillna(x.groupby('Platform')['Critic_Score'].transform('mean'), inplace = True)

0    None
1    None
2    None
3    None
4    None
Name: Critic_Score, dtype: object
x.loc[x.Critic_Score.isnull(), 'Critic_Score'] = x.groupby('Platform').Critic_Score.transform('mean')
#no change in column
0    76.0
1     NaN
2    82.0
3    80.0
4     NaN
x['Critic_Score'] = x.groupby('Platform')['Critic_Score']\
    .transform(lambda y: y.fillna(y.mean()))
#no change in column
0    76.0
1     NaN
2    82.0
3    80.0
4     NaN
Name: Critic_Score, dtype: float64
x['Critic_Score']=x.groupby('Platform')['Critic_Score'].apply(lambda y:y.fillna(y.mean()))
​
x['Critic_Score'].head()
​

Out[73]:
0    76.0
1     NaN
2    82.0
3    80.0
4     NaN
Name: Critic_Score, dtype: float64
  • Got it. Sorry. This is what I have: print(x.head()) x = x.update(x.groupby('Platform').transform('mean')) x['Critic_Score'].head() – Kushal Mohnot Aug 29 '20 at 23:36
  • 1
    0 76.0 1 NaN 2 82.0 3 80.0 4 NaN Name: Critic_Score, dtype: float64 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in 23 x = x.update(x.groupby('Platform').transform('mean')) 24 ---> 25 x['Critic_Score'].head() 26 TypeError: 'NoneType' object is not subscriptable – Kushal Mohnot Aug 29 '20 at 23:37
  • Turns out this has been answered before https://stackoverflow.com/a/19966142/6692898 – RichieV Aug 29 '20 at 23:47
  • So I tried that method, and for some reason it just replaced the values of the 'Critic_Score' with the values of the Year_Released (it was the column next to critic_score in the actual data_frame) – Kushal Mohnot Aug 29 '20 at 23:54
  • @RichieV, I incorporated your suggestion into the question. – Kushal Mohnot Aug 30 '20 at 00:10

1 Answers1

2
x.update(
    x.groupby('Platform').Critic_Score.transform('mean'),
    overwrite=False)
  • First you create a new df with the same number of rows but with the platform average on every row.

  • Then use that to update the original

Bear in mind your sample has only one row of nes and another of gb, both with nan score, so there is nothing to be averaged

RichieV
  • 5,103
  • 2
  • 11
  • 24