Pandas - Replace NaNs in a column with the mean of specific group

Question

I am working with data like the following. The dataframe is sorted by the date:

category  value  Date
0         1      24/5/2019     
1         NaN    24/5/2019         
1         1      26/5/2019       
2         2      1/6/2019      
1         2      23/7/2019       
2         NaN    18/8/2019         
2         3      20/8/2019       
7         3      1/9/2019 
1         NaN    12/9/2019       
2         NaN      13/9/2019

I would like to replace the "NaN" values with the previous mean for that specific category.

What is the best way to do this in pandas?

Some approaches I considered:

1) This litte riff:

   df['mean' = df.groupby('category')['time'].apply(lambda x: x.shift().expanding().mean()))

source

This gets me the the correct means in but in another column, and it does not replace the NaNs.

2) This riff replaces the NaNs with the average of the columns:

df = df.groupby(df.columns, axis = 1).transform(lambda x: x.fillna(x.mean()))

Source 2

Both of these do not exactly give what I want. If someone could guide me on this it would be much appreciated!

What do you exactly mean with the "previous mean for that specific category"? Because the category's repeat. — Erfan, Sep 15 '19 at 11:23
By this i mean, for every NaN value, look at the corresponding category, find the mean of that category across all previous dates, then replace the NaN with this calculated mean value. — Convex Leopard, Sep 15 '19 at 11:25
`df['value']=df['value'].fillna(df.groupby('category')['value'].transform(lambda x: x.shift().expanding().mean()))`? — ansev, Sep 15 '19 at 11:42

score 4 · Answer 1 · edited Sep 15 '19 at 11:51

You can replace value by new Series from shift + expanding + mean, first value of 1 group is not replaced, because no previous NaN values exits:

df['Date'] = pd.to_datetime(df['Date'])
s = df.groupby('category')['value'].apply(lambda x: x.shift().expanding().mean())
df['value'] = df['value'].fillna(s)
print (df)
   category  value       Date
0         0    1.0 2019-05-24
1         1    NaN 2019-05-24
2         1    1.0 2019-05-26
3         2    2.0 2019-01-06
4         1    2.0 2019-07-23
5         2    2.0 2019-08-18
6         2    3.0 2019-08-20
7         7    3.0 2019-01-09
8         1    1.5 2019-12-09
9         2    2.5 2019-09-13

Thanks this is some awesome python magic. Helped me learn a few new ways to do things. — Convex Leopard, Sep 15 '19 at 11:47

ansev · Answer 2 · 2019-09-15T11:48:41.517

2

You can use pandas.Series.fillna to replace NaN values:

df['value']=df['value'].fillna(df.groupby('category')['value'].transform(lambda x: x.shift().expanding().mean()))
print(df)

   category  value       Date
0         0    1.0  24/5/2019
1         1    NaN  24/5/2019
2         1    1.0  26/5/2019
3         2    2.0   1/6/2019
4         1    2.0  23/7/2019
5         2    2.0  18/8/2019
6         2    3.0  20/8/2019
7         7    3.0   1/9/2019
8         1    1.5  12/9/2019
9         2    2.5  13/9/2019

edited Sep 15 '19 at 11:48

answered Sep 15 '19 at 11:42

ansev

30,322
5
17
31

Thanks this is a great way to approach it, I learnt something new here! – Convex Leopard Sep 15 '19 at 11:48
1

would you appreciate if you voted positive @ConvexLeopard – ansev Sep 15 '19 at 14:57
Sure, I have upvoted (apologies didn't know I could upvote). Thanks for your help! – Convex Leopard Sep 15 '19 at 15:00

Pandas - Replace NaNs in a column with the mean of specific group

2 Answers2