1

I try to count values in group_by with NA in one column of data frame. I have data like this:

> df <- data.frame(id = c(1, 2, 3, NA, 4, NA),
                 group = c("A", "A", "B", "C", "D", "E"))
> df
  id group
1  1     A
2  2     A
3  3     B
4 NA     C
5  4     D
6 NA     E

I want to count groups having NA in first column as 0, but with an approach like this

> df %>% group_by(group) %>% summarise(n = n())
# A tibble: 5 x 2
  group     n
* <chr> <int>
1 A         2
2 B         1
3 C         1
4 D         1
5 E         1

i have 1 in rows C and E but not 0 which i want. The expected result looks like this:

# A tibble: 5 x 2
  group     n
* <chr> <int>
1 A         2
2 B         1
3 C         0
4 D         1
5 E         0

How can i do this?

1 Answers1

1

We can get the sum of a logical vector created with is.na to get the count as TRUE => 1 and FALSE => 0 so the sum returns the count of non-NA elements

library(dplyr)
df %>% 
   group_by(group) %>% 
   summarise(n = sum(!is.na(id)))
# A tibble: 5 x 2
#    group     n
# * <chr> <int>
#1 A         2
#2 B         1
#3 C         0
#4 D         1
#5 E         0

Or use length after subsetting

df %>%
   group_by(group) %>% 
   summarise(n = length(id[!is.na(id)]))

n() returns the total number of rows including the missing values

akrun
  • 874,273
  • 37
  • 540
  • 662