Pandas count null values in a groupby function

Question

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
               'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
               'C' : [np.nan, 'bla2', np.nan, 'bla3', np.nan, np.nan, np.nan, np.nan]})

Output:

     A      B     C
0  foo    one   NaN
1  bar    one  bla2
2  foo    two   NaN
3  bar  three  bla3
4  foo    two   NaN
5  bar    two   NaN
6  foo    one   NaN
7  foo  three   NaN

I would like to use groupby in order to count the number of NaN's for the different combinations of foo.

Expected Output (EDIT):

     A      B     C    D
0  foo    one   NaN    2
1  bar    one  bla2    0
2  foo    two   NaN    2
3  bar  three  bla3    0
4  foo    two   NaN    2
5  bar    two   NaN    1
6  foo    one   NaN    2
7  foo  three   NaN    1

Currently I am trying this:

df['count']=df.groupby(['A'])['B'].isnull().transform('sum')

But this is not working...

Thank You

score 51 · Accepted Answer · edited Apr 28 '22 at 05:21

I think you need groupby with sum of NaN values:

df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int).reset_index(name='count')
print(df2)
     A      B  count
0  bar    one      0
1  bar  three      0
2  bar    two      1
3  foo    one      2
4  foo  three      1
5  foo    two      2

Notice that the .isnull() is on the original Dataframe column, not on the groupby()-object. The groupby() does not have .isnull() but if it would have it, it would be expected to give the same result as with .isnull() on the original DataFrame.

If need filter first add boolean indexing:

df = df[df['A'] == 'foo']
df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int)
print(df2)
A    B    
foo  one      2
     three    1
     two      2

Or simpler:

df = df[df['A'] == 'foo']
df2 = df['B'].value_counts()
print(df2)
one      2
two      2
three    1
Name: B, dtype: int64

EDIT: Solution is very similar, only add transform:

df['D'] = df.C.isnull().groupby([df['A'],df['B']]).transform('sum').astype(int)
print(df)
     A      B     C  D
0  foo    one   NaN  2
1  bar    one  bla2  0
2  foo    two   NaN  2
3  bar  three  bla3  0
4  foo    two   NaN  2
5  bar    two   NaN  1
6  foo    one   NaN  2
7  foo  three   NaN  1

Similar solution:

df['D'] = df.C.isnull()
df['D'] = df.groupby(['A','B'])['D'].transform('sum').astype(int)
print(df)
     A      B     C  D
0  foo    one   NaN  2
1  bar    one  bla2  0
2  foo    two   NaN  2
3  bar  three  bla3  0
4  foo    two   NaN  2
5  bar    two   NaN  1
6  foo    one   NaN  2
7  foo  three   NaN  1

That answered my initial question perfectly, but I just realized that I had the 'Expected answer' wrong. Sorry About that. I need the result to be added to the initial dataframe. — Stefan, Apr 10 '17 at 11:42
The `name` argument in `.reset_index(name='count')` doesn't seem to be supported: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html — jmatsen, Jul 15 '20 at 20:06

score 34 · Answer 2 · answered Apr 10 '17 at 11:40

34

df[df.A == 'foo'].groupby('b').agg({'C': lambda x: x.isnull().sum()})

returns:

=>        C
B       
one    2
three  1
two    2

answered Apr 10 '17 at 11:40

tagoma

3,896
4
38
57

3

This is the cleanest solution in pandas to my taste. Btw in R data.table it would be only `df[A=='foo', sum(is.na(C)), b]` – yImI Jul 20 '21 at 21:14

score 3 · Answer 3 · answered Feb 14 '23 at 15:55

3

just add this parameter dropna=False

df.groupby(['A', 'B','C'], dropna=False).size()

check the documentation: dropnabool, default True If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

answered Feb 14 '23 at 15:55

user3590035

308
2
2

This behavior is what I was looking for - basically something consistent with R's table(..., useNA = 'ifany') – srvanderplas May 08 '23 at 15:24

Pandas count null values in a groupby function

3 Answers3

Linked

Related