0

So I have a pandas dataframe that has 2 columns:

import numpy as np
data = {'Column_A': [1,2,3,4,5,np.nan,6,7,np.nan,np.nan,8,9,10,np.nan], 'Column_B': [1,2,3,4,5,np.nan,6,7,np.nan,np.nan,8,9,10,11]}
df = pd.DataFrame(data)

I wanted to count the number of null and non-null values for each column so I used df.isnull().sum() and df.notnull().sum() and that gives me a list of all my columns with the count of null and not null for each column:

df.isnull().sum()
Column_A    4
Column_B    3
df.notnull().sum()
Column_A    10
Column_B    11

What I want to do is add the count of null and not null for each column to the dataframe and group by all the columns with the original columns called Column Names. So for example the dataframe would look like this:

Column_Names Count_Null Count_NotNull
Column_A     4           10
Column_B     3           11

And so on.

I can run df.isnull().sum() and df.notnull().sum() separately to get the correct counts for all the columns.

But how can I create a table that has the original columns on rows, with the two metrics as additional columns like in the example above?

data = {'Column_A': ['A','B','C','A','C','C','B','B','B','B','A','C','D','A'],'Column_B': [1,2,3,4,5,np.nan,6,7,np.nan,np.nan,8,9,10,np.nan], 'Column_C': [1,2,3,4,5,np.nan,6,7,np.nan,np.nan,8,9,10,11]}
df = pd.DataFrame(data)

NewTable = df.Column_B.isnull().groupby([df['Column_A']]).sum().astype(int).reset_index(name='Count_Null_B')

NewTable
hockeyhorror
  • 57
  • 2
  • 7

1 Answers1

1

I'd do it like this:

names = {False: 'Count_NotNull', True: 'Count_Null'}
result = df.isna().apply(pd.Series.value_counts, dropna=False).rename(names).T

edit - or without apply:

result = df.isna().sum().rename('Count_Null').to_frame()
result['Count_NotNull'] = len(df) - result['Count_Null']
timgeb
  • 76,762
  • 20
  • 123
  • 145
  • @hockeyhorror "there is no column title for the first column" I don't understand that part – timgeb Mar 07 '22 at 17:29
  • So I figured it out. I was trying to add the name 'Field' to the index. I did so by adding ```result.reset_index result.index.name = 'Field'``` – hockeyhorror Mar 07 '22 at 21:51
  • But I was trying to do one more thing. I added some new code at the end of my question and an extra column. I'm trying to add multiple columns that count the nulls in one table. So grouping by Column A and having a count of nulls for column B and column C. The code at the end of my question adds the count of nulls for B. But do you know how I might add a third column to NewTable for the null count of column C? – hockeyhorror Mar 07 '22 at 21:57