How does pandas replace NaN values with mean value using groupby

Question

I tried using this to replace the NaN values in the column feature count ( its an integer that ranges from 1 to 10 ) using groupby ( client_id or client _ name ) , however the NaN values do not seem to go.

df['feature_count'].isnull().sum()

The output is :

Now I use:

df['feature_count'].fillna(df.groupby('client_name')['feature_count'].mean(), inplace=True)

But the output remains the same :

df['feature_count'].isnull().sum()

2254

Any other way to replace the NaN values by the means of other non NaN values of the column grouped by their IDs?

The code you use looks erroneous, especially the inplace=True part. Try to get the mean first (in a variable). When you achieve that you fill. And if you want us to solve this problem you should provide a sample of your code according to [mcve]. — Anton vBR, Jun 22 '18 at 16:05
i have 500 Client ids, that means I would have to find the mean 500 times, isnt that a lot of work? — Krishna Dhruv, Jun 22 '18 at 16:06

score 3 · Accepted Answer · answered Jun 22 '18 at 16:09

3

df.groupby('client_name')['feature_count'].mean() returns a series.

But you aren't looking to replace null values with a series. Instead, you want to replace null values with a mean mapped from a series.

Therefore, you can use the following:

s = df.groupby('client_name')['feature_count'].mean()
df['feature_count'].fillna(df['client_name'].map(s), inplace=True)

Even more Pandorable would be to utilize pd.DataFrame.transform, which handles the mapping part for you:

s = df.groupby('client_name')['feature_count'].transform('mean')
df['feature_count'].fillna(s, inplace=True)

answered Jun 22 '18 at 16:09

jpp

159,742
34
281
339

I tried what you suggested, the NaN values didnt disappear completely, they reduced from 2254 to 529 – Krishna Dhruv Jun 22 '18 at 16:12
@KrishnaDhruv, Guess some of your groups may be all `NaN`, have a look at your inputs. Otherwise, you'll need to provide a [mcve]. – jpp Jun 22 '18 at 16:12
Yes! Some of the groups are all NaN.Thanks for the insight and the answer!!!! :)) – Krishna Dhruv Jun 22 '18 at 16:17

How does pandas replace NaN values with mean value using groupby

1 Answers1