1

In my dataset there is the variable "cigarettes per day" (CPD) for 21 days and several subjects (ID). I want to know how many and which subjects never smoked (e.g. have only 0 in CPD) across the 21 days.

Here is a example dataset for 3 subjects and 5 days

day <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
CPD <- c(3,4,0,2,0,0,0,0,0,0,4,0,0,0,1)
df <- data.frame(day, ID, CPD)

what I want would be something like this:

   day ID CPD
1    1  2 0
2    2  2 0
3    3  2 0
4    4  2 0
5    5  2 0
any
  • 43
  • 4

2 Answers2

2

We may use a group by all approach

library(dplyr)
df %>% 
   group_by(ID) %>%
   filter(all(CPD %in% 0)) %>%
   ungroup

-output

# A tibble: 5 × 3
    day    ID   CPD
  <dbl> <dbl> <dbl>
1     1     2     0
2     2     2     0
3     3     2     0
4     4     2     0
5     5     2     0

Or without grouping

df %>% 
   filter(!ID %in% ID[CPD != 0])
  day ID CPD
1   1  2   0
2   2  2   0
3   3  2   0
4   4  2   0
5   5  2   0

Or with base R

subset(df, !ID %in% ID[CPD != 0])
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Here is a slighltly modified dplyr (@akrun) approach:

libaray(dplyr)

df %>% 
  group_by(ID) %>% 
  filter(all(CPD==0)==TRUE)

# Groups:   ID [1]
    day    ID   CPD
  <dbl> <dbl> <dbl>
1     1     2     0
2     2     2     0
3     3     2     0
4     4     2     0
5     5     2     0

and here is a data.table approach:

library(data.table)

setDT(df)[,if(all(CPD == 0)) .SD , by = ID] 

   ID day CPD
1:  2   1   0
2:  2   2   0
3:  2   3   0
4:  2   4   0
5:  2   5   0
Henrik
  • 65,555
  • 14
  • 143
  • 159
TarJae
  • 72,363
  • 6
  • 19
  • 66