0

Given the following data frame:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Site':['A','A','A','B','B','B','C','C','C'],
                   'Value':[np.nan,1,np.nan,np.nan,2,2,3,np.nan,3]})

df

    Site    Value
0   A       NaN
1   A       1.0
2   A       NaN
3   B       NaN
4   B       2.0
5   B       2.0
6   C       3.0
7   C       NaN
8   C       3.0

I'd like to fill the NaN values with the most common (median or mean will do) value for the site. The desired result is:

    Site    Value
0   A       1.0
1   A       1.0
2   A       1.0
3   B       2.0
4   B       2.0
5   B       2.0
6   C       3.0
7   C       3.0
8   C       3.0

Thanks in advance!

Update: This is close, but no cigar:

df['Value']=df.groupby(['Site'])['Value'].fillna(min)

resulting in...

    Site    Value
0   A   <function amax at 0x108cf9048>
1   A   1
2   A   <function amax at 0x108cf9048>
3   B   <function amax at 0x108cf9048>
4   B   2
5   B   2
6   C   3
7   C   <function amax at 0x108cf9048>
8   C   3
Dance Party
  • 3,459
  • 10
  • 42
  • 67
  • 1
    you can find the answer here: https://stackoverflow.com/questions/19966018/pandas-filling-missing-values-by-mean-in-each-group?rq=1 – dmb Mar 24 '16 at 00:52

1 Answers1

5

You can use transform as answered here

df['Value'] = df.groupby('Site').transform(lambda x: x.fillna(x.mean()))


  Site  Value
0    A      1
1    A      1
2    A      1
3    B      2
4    B      2
5    B      2
6    C      3
7    C      3
8    C      3
dmb
  • 1,669
  • 1
  • 12
  • 21
  • 3
    In the case where there is >1 column: df['Value'] = df.groupby('Site')['Value'].transform(lambda x: x.fillna(x.max())) – Dance Party Mar 24 '16 at 01:09
  • 1
    That's fine of course, you could also shorten to: `df.groupby('Site').transform('mean')` – JohnE Mar 24 '16 at 01:13