0

I'm working with a data frame in python that has a lot of NAs. I'd like to count the number of NAs per variable. I've looked through the documentation and found the count(), except it, gives me the opposite of what I want:

df.groupby("var1").count()

My question is, how can I instead only count the number of NAs in groupby? I tried:

df.groupby("var1").isnull() or df.groupby("var1").isna() 

or

df.groupby("var1").apply(isnull)

but that gives me errors.

What I'd like to do is this: group the database per variable (citizenship in this case) and then count the number of NAs for every level of its factor.

I'd like the output to be like the screenshot but with numbers of NAs instead of numbers of observations minus number of NAs as output:

screenshot

Or, in other words, I am looking for a solution of this R code in Python:

dat%>%group_by(citizenship)%>%summarise_all(funs(sum(is.na(.)))
darjeely
  • 1
  • 1

1 Answers1

0

If by 'per variable' you mean per each dataframe column, you can do this:

df['column name'].isna().sum()

If you want to use groupby, you can find a solution here: Pandas count null values in a groupby function

Gray-lab
  • 165
  • 2
  • 8