4

I Know that the fillna() method can be used to fill NaN in whole dataframe.

df.fillna(df.mean()) # fill with mean of column.

How to limit mean calculation to the group (and the column) where the NaN is.

Exemple:

import pandas as pd 
import numpy as np 

df = pd.DataFrame({
    'a': pd.Series([1,1,1,2,2,2]),
    'b': pd.Series([1,2,np.NaN,1,np.NaN,4])
})

print df

Input

   a   b
0  1   1
1  1   2
2  1 NaN
3  2   1
4  2 NaN
5  2   4

Output (after groupby('a') & replace NaN by mean of group)

   a    b
0  1  1.0
1  1  2.0
2  1  1.5
3  2  1.0
4  2  2.5
5  2  4.0
Ghilas BELHADJ
  • 13,412
  • 10
  • 59
  • 99

2 Answers2

6

IIUC then you can call fillna with the result of groupby on 'a' and transform on 'b':

In [44]:
df['b'] = df['b'].fillna(df.groupby('a')['b'].transform('mean'))
df

Out[44]:
   a    b
0  1  1.0
1  1  2.0
2  1  1.5
3  2  1.0
4  2  2.5
5  2  4.0

If you have multiple NaN values then I think the following should work:

In [47]:
df.fillna(df.groupby('a').transform('mean'))

Out[47]:
   a    b
0  1  1.0
1  1  2.0
2  1  1.5
3  2  1.0
4  2  2.5
5  2  4.0

EDIT

In [49]:
df = pd.DataFrame({
    'a': pd.Series([1,1,1,2,2,2]),
    'b': pd.Series([1,2,np.NaN,1,np.NaN,4]),
    'c': pd.Series([1,np.NaN,np.NaN,1,np.NaN,4]),
    'd': pd.Series([np.NaN,np.NaN,np.NaN,1,np.NaN,4])
})
df

Out[49]:
   a   b   c   d
0  1   1   1 NaN
1  1   2 NaN NaN
2  1 NaN NaN NaN
3  2   1   1   1
4  2 NaN NaN NaN
5  2   4   4   4

In [50]:
df.fillna(df.groupby('a').transform('mean'))

Out[50]:
   a    b    c    d
0  1  1.0  1.0  NaN
1  1  2.0  1.0  NaN
2  1  1.5  1.0  NaN
3  2  1.0  1.0  1.0
4  2  2.5  2.5  2.5
5  2  4.0  4.0  4.0

You get all NaN for 'd' as all values are NaN for group 1 for d

EdChum
  • 376,765
  • 198
  • 813
  • 562
0

We first compute the group means, ignoring the missing values:

group_means = df.groupby('a')['b'].agg(lambda v: np.nanmean(v))

Next, we use groupby again, this time fetching the corresponding values:

df_new = df.groupby('a').apply(lambda t: t.fillna(group_means.loc[t['a'].iloc[0]]))
Boris Gorelik
  • 29,945
  • 39
  • 128
  • 170