Conditionally delete individuals from longtidunal data

Question

I have a longitudinal data set where I want to drop individuals (id) if they do no fulfill the criterion indicated by criteria == 1 at any time points. To put it in context we could say that criteria denotes if the individual was living in the region of interest at any time during. Using some toy-data that have a similar structure as mine:

id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
time <-  c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3) 
event <- c(0,1,0,1,0,0,0,0,0,0,1,0,1,0,1)
criteria <- c(1,0,0,0,0,0, 0, 0, 0, 1, 1, 1,0,0,1)


df <- data.frame(cbind(id,time,event, criteria))

> df
   id time event criteria
1   1    1     0        1
2   1    2     1        0
3   1    3     0        0
4   2    1     1        0
5   2    2     0        0
6   2    3     0        0
7   3    1     0        0
8   3    2     0        0
9   3    3     0        0
10  4    1     0        1
11  4    2     1        1
12  4    3     0        1
13  5    1     1        0
14  5    2     0        0
15  5    3     1        1

So by removing any id that have criteria == 0 at all time points (time) would lead to an end result looking like this:

   id time event criteria
1   1    1     0        1
2   1    2     1        0
3   1    3     0        0
4   4    1     0        1
5   4    2     1        1
6   4    3     0        1
7   5    1     1        0
8   5    2     0        0
9   5    3     1        1

I've been trying to achieve this by using dplyr::group_by(id) and then filter on the criterion but that does not achieve the result I want to. I'd prefer a tidyverse solution! :D

Thanks!

score 1 · Accepted Answer · answered Mar 16 '21 at 10:32

1

df %>%
  group_by(id) %>%
  # looking for the opposite (i.e. !) of criteria == 1 at least 1 time
  mutate(is_good = !any(criteria == 1)) %>%
  filter(is_good)

answered Mar 16 '21 at 10:32

Jakub.Novotny

2,912
2
6
21

Worked with one modification. Remove the acclamation mark. Thanks! – ecl Mar 16 '21 at 12:24

score 1 · Answer 2 · answered Mar 16 '21 at 10:36

If you'd be willing to look into data.table's, which I recommend, it would be as simple as this:


library(data.table)
setDT(df) # make it a data.table

df[ , .SD[ !all(criteria==0) ], by=id ]

See this page for a general introduction and an explanation of the .SD idiom:

https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

Conditionally delete individuals from longtidunal data

2 Answers2