Pandas fillna by mean of each Group

Question

I have a pandas dataframe with several columns. I'd like to fillna's in select columns with mean of each group.

import pandas as pd
import numpy as np

df = pd.DataFrame({
                   'cat': ['A','A','A','B','B','B','C','C'],
                   'v1': [10, 12, np.nan, 10, 14, np.nan, 11, np.nan],
                   'v2': [12, 8, np.nan, np.nan, 6, 12, 10, np.nan]
                 })

I am looking for a solution that's scalable, meaning, I could apply do the operation on several columns.

np.nan's will be filled with mean of each group.

Expected output:

cat  v1   v2
 
A    10   12
A    12   8
A    11   10
B    10   9
B    14   6
B    12   12
C    11   10
C    11   10

Other similar questions are limited to a single column, I am looking for a solution that is generalizable and works imputing missing NAs for several columns.

It seems like you're looking for something like [this answer](https://stackoverflow.com/a/65483740/15497888) or [this answer](https://stackoverflow.com/a/65394359/15497888). [The canonical](https://stackoverflow.com/a/53339320/15497888) also works if you just specify the columns on the groupby. `cols = ['v1', 'v2']` then `df[cols] = df[cols].fillna(df.groupby('cat')[cols].transform('mean'))` — Henry Ecker, May 22 '22 at 21:34

score 1 · Answer 1 · answered May 20 '22 at 16:54

1

This will replace all of the np.nan's with the mean of the column

import pandas as pd
import numpy as np

df = pd.DataFrame({
                   'cat': ['A','A','A','B','B','B','C','C'],
                   'v1': [10, 12, np.nan, 10, 14, np.nan, 11, np.nan],
                   'v2': [12, 8, np.nan, np.nan, 6, 12, 10, np.nan]
                 })

for x in df.columns.drop('cat'):
    mean_of_column = df[x].mean()
    df[x].fillna(mean_of_column, inplace = True)
df

Please note that this will make the column a float since them mean is not a neat int. If you wanted to, however, you could continue to work with it to remove the decimal.

answered May 20 '22 at 16:54

ArchAngelPwn

2,891
1
4
17

I don't think this would calculate the mean of the group, it will replace `NAs` with mean of the column. – kms May 20 '22 at 17:02
Did you mean the mean of the group as in df['cat']['A']? – ArchAngelPwn May 20 '22 at 17:07
mean of `df.groupby('cat')` – kms May 20 '22 at 17:09

score 0 · Answer 2 · answered May 23 '22 at 02:30

0

Try this:

df = df.fillna(df.groupby('cat').transform('mean'))

Output:

  cat    v1    v2
0   A  10.0  12.0
1   A  12.0   8.0
2   A  11.0  10.0
3   B  10.0   9.0
4   B  14.0   6.0
5   B  12.0  12.0
6   C  11.0  10.0
7   C  11.0  10.0

answered May 23 '22 at 02:30

rhug123

7,893
1
9
24

Pandas fillna by mean of each Group

2 Answers2