-2

I have a df like this:

a   b   c   d
1   1   1   1
2   2   3   NA
3   NA  NA  2 
4   NA  2   NA
5   NA  NA  NA

Is it possible to get a count of how many rows doesn't have non missing values on all variables, so for example here it will return 4 as the rest of the rest have at least one none NA value on some of it's variables.

EGM8686
  • 1,492
  • 1
  • 11
  • 22
  • Shouldn't "a count of how many rows have non missing values on all variables" return 1 for your example data (the first row) ? Maybe you need to explain the logic more clearly. – neilfws Sep 04 '19 at 04:46
  • Yes, it was a typo already edited the question – EGM8686 Sep 04 '19 at 04:57
  • 2
    Are you neglecting column `a` from calculation? So `sum(rowSums(!is.na(df[-1])) > 0)` ? – Ronak Shah Sep 04 '19 at 05:03
  • *"how many rows doesn't have non missing values on all variables"* Do you mean how many rows have at last one `NA` entry? E.g. `sum(is.na(rowSums(df)))`? – Maurits Evers Sep 04 '19 at 05:22
  • No. On this example the answer should 4. You need to exclude the a var from the calculations since that one is the ID number – EGM8686 Sep 04 '19 at 12:47
  • @EGM8686 `sum(is.na(rowSums(df)))` returns 4. So is that what you're after? Your problem statement is not very clear. – Maurits Evers Sep 04 '19 at 22:10

1 Answers1

0

Since it is clear now that we want to ignore column a from calculation. Here are few ways:

Using rowSums

sum(rowSums(!is.na(df[-1])) > 0)
#[1] 4

#OR

sum(rowSums(is.na(df[-1])) != (ncol(df) - 1))

Using apply

sum(apply(!is.na(df[-1]), 1, any))

#OR

sum(!apply(is.na(df[-1]), 1, all))

Using filter_at from dplyr filters the row based on condition but we can use nrow to get the number of rows which satisfies our requirement.

library(dplyr)
df %>%  filter_at(-1, any_vars(!is.na(.))) %>% nrow
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213