Which IDs have only zero-counts in variable across all days?

Question

In my dataset there is the variable "cigarettes per day" (CPD) for 21 days and several subjects (ID). I want to know how many and which subjects never smoked (e.g. have only 0 in CPD) across the 21 days.

Here is a example dataset for 3 subjects and 5 days

day <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
CPD <- c(3,4,0,2,0,0,0,0,0,0,4,0,0,0,1)
df <- data.frame(day, ID, CPD)

what I want would be something like this:

   day ID CPD
1    1  2 0
2    2  2 0
3    3  2 0
4    4  2 0
5    5  2 0

akrun · Accepted Answer · 2022-08-03T17:24:43.027

We may use a group by all approach

library(dplyr)
df %>% 
   group_by(ID) %>%
   filter(all(CPD %in% 0)) %>%
   ungroup

-output

# A tibble: 5 × 3
    day    ID   CPD
  <dbl> <dbl> <dbl>
1     1     2     0
2     2     2     0
3     3     2     0
4     4     2     0
5     5     2     0

Or without grouping

df %>% 
   filter(!ID %in% ID[CPD != 0])
  day ID CPD
1   1  2   0
2   2  2   0
3   3  2   0
4   4  2   0
5   5  2   0

Or with base R

subset(df, !ID %in% ID[CPD != 0])

score 2 · Answer 2 · edited Aug 03 '22 at 18:29

Here is a slighltly modified dplyr (@akrun) approach:

libaray(dplyr)

df %>% 
  group_by(ID) %>% 
  filter(all(CPD==0)==TRUE)

# Groups:   ID [1]
    day    ID   CPD
  <dbl> <dbl> <dbl>
1     1     2     0
2     2     2     0
3     3     2     0
4     4     2     0
5     5     2     0

and here is a data.table approach:

library(data.table)

setDT(df)[,if(all(CPD == 0)) .SD , by = ID] 

   ID day CPD
1:  2   1   0
2:  2   2   0
3:  2   3   0
4:  2   4   0
5:  2   5   0

Which IDs have only zero-counts in variable across all days?

2 Answers2