counting the number of nas in df.groupby python

Question

I'm working with a data frame in python that has a lot of NAs. I'd like to count the number of NAs per variable. I've looked through the documentation and found the count(), except it, gives me the opposite of what I want:

df.groupby("var1").count()

My question is, how can I instead only count the number of NAs in groupby? I tried:

df.groupby("var1").isnull() or df.groupby("var1").isna()

or

df.groupby("var1").apply(isnull)

but that gives me errors.

What I'd like to do is this: group the database per variable (citizenship in this case) and then count the number of NAs for every level of its factor.

I'd like the output to be like the screenshot but with numbers of NAs instead of numbers of observations minus number of NAs as output:

screenshot

Or, in other words, I am looking for a solution of this R code in Python:

dat%>%group_by(citizenship)%>%summarise_all(funs(sum(is.na(.)))

Can you include an example of your dataframe in your question? — Gray-lab, Apr 29 '22 at 17:35

score 0 · Answer 1 · answered Apr 29 '22 at 17:36

0

If by 'per variable' you mean per each dataframe column, you can do this:

df['column name'].isna().sum()

If you want to use groupby, you can find a solution here: Pandas count null values in a groupby function

answered Apr 29 '22 at 17:36

Gray-lab

165
2
8

thank you but this is not what I'm looking for. I just edited the question! :) – darjeely May 01 '22 at 11:43

counting the number of nas in df.groupby python

1 Answers1