I'm working with a data frame in python that has a lot of NAs. I'd like to count the number of NAs per variable. I've looked through the documentation and found the count()
, except it, gives me the opposite of what I want:
df.groupby("var1").count()
My question is, how can I instead only count the number of NAs in groupby? I tried:
df.groupby("var1").isnull() or df.groupby("var1").isna()
or
df.groupby("var1").apply(isnull)
but that gives me errors.
What I'd like to do is this: group the database per variable (citizenship in this case) and then count the number of NAs for every level of its factor.
I'd like the output to be like the screenshot but with numbers of NAs instead of numbers of observations minus number of NAs as output:
Or, in other words, I am looking for a solution of this R code in Python:
dat%>%group_by(citizenship)%>%summarise_all(funs(sum(is.na(.)))