0

How do I find the length of grouped columns, excluding NAs?

For example, with the following data frame,

Year  State var1  var2
TX    2     NA
WA    0     3
CA    NA    1
CA    2     NA
CA    2     3
TX    NA    4
WA    NA    NA
WA    3     3
CA    NA    0

I want it to return

State var1  var2
TX    1     1
WA    2     2
CA    2     3

Some similar questions have been asked (e.g. R - Get number of values per group without counting NAs) but I'm still struggling to get the output I want.

I've tried several variations on summarise_if and summarise_all, but they either throw up errors or don't calculate the right thing.

counts <- df %>%
group_by(State) %>% 
summarise_all(funs(length(!is.na(.))))
bm4n4d
  • 13
  • 3

3 Answers3

1

Just create an anonymous function that returns the sum of the logical vector returned by is.na():

library(dplyr)

df %>% 
  group_by(State) %>% 
  summarise_all(.funs = function(x) { sum(!is.na(x)) })

# A tibble: 3 x 3
   State  var1  var2
  <fctr> <int> <int>
1     CA     2     3
2     TX     1     1
3     WA     2     2
clemens
  • 6,653
  • 2
  • 19
  • 31
1

With data.table:

library(data.table)
setDT(df)
df[,lapply(.SD, function(x) sum(!is.na(x))),State]

   State var1 var2
1:    TX    1    1
2:    WA    2    2
3:    CA    2    3
YOLO
  • 20,181
  • 5
  • 20
  • 40
0

the dplyr approach:

df=tibble(State,var1,var2)%>%group_by(State)%>%summarize(var1=sum(!is.na(var1)),var2=sum(!is.na(var2)))
> df
# A tibble: 3 x 3
  State  var1  var2
  <chr> <int> <int>
1 CA        2     3
2 TX        1     1
3 WA        2     2
Antonios
  • 1,919
  • 1
  • 11
  • 18