3

I tried using this to replace the NaN values in the column feature count ( its an integer that ranges from 1 to 10 ) using groupby ( client_id or client _ name ) , however the NaN values do not seem to go.

df['feature_count'].isnull().sum()

The output is :

2254

Now I use:

df['feature_count'].fillna(df.groupby('client_name')['feature_count'].mean(), inplace=True)

But the output remains the same :

df['feature_count'].isnull().sum()

2254

Any other way to replace the NaN values by the means of other non NaN values of the column grouped by their IDs?

jpp
  • 159,742
  • 34
  • 281
  • 339
Krishna Dhruv
  • 65
  • 1
  • 6
  • 1
    The code you use looks erroneous, especially the inplace=True part. Try to get the mean first (in a variable). When you achieve that you fill. And if you want us to solve this problem you should provide a sample of your code according to [mcve]. – Anton vBR Jun 22 '18 at 16:05
  • i have 500 Client ids, that means I would have to find the mean 500 times, isnt that a lot of work? – Krishna Dhruv Jun 22 '18 at 16:06

1 Answers1

3

df.groupby('client_name')['feature_count'].mean() returns a series.

But you aren't looking to replace null values with a series. Instead, you want to replace null values with a mean mapped from a series.

Therefore, you can use the following:

s = df.groupby('client_name')['feature_count'].mean()
df['feature_count'].fillna(df['client_name'].map(s), inplace=True)

Even more Pandorable would be to utilize pd.DataFrame.transform, which handles the mapping part for you:

s = df.groupby('client_name')['feature_count'].transform('mean')
df['feature_count'].fillna(s, inplace=True)
jpp
  • 159,742
  • 34
  • 281
  • 339