2

Suppose I have the following DF:

id  flag            time
1   1   2017-01-01 UTC--2017-01-07 UTC
1   0   2018-01-01 UTC--2019-01-01 UTC
1   0   2017-01-03 UTC--2017-01-09 UTC
2   1   2017-01-01 UTC--2017-01-15 UTC
2   1   2018-07-01 UTC--2018-09-01 UTC
2   1   2018-10-12 UTC--2018-10-20 UTC
2   0   2017-01-12 UTC--2017-01-16 UTC
2   0   2017-03-01 UTC--2017-03-15 UTC
2   0   2017-12-01 UTC--2017-12-31 UTC
2   0   2018-08-15 UTC--2018-09-19 UTC
2   0   2018-10-01 UTC--2018-10-21 UTC

Created with the following code:

df <- data.frame(id=c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),     
                  flag=c(1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0), 
                  time=c(interval(ymd(20170101), ymd(20170107)),
                       interval(ymd(20180101), ymd(20190101)), 
                       interval(ymd(20170103), ymd(20170109)), 
                       # Casos
                       interval(ymd(20170101), ymd(20170115)), 
                       interval(ymd(20180701), ymd(20180901)),
                       interval(ymd(20181012), ymd(20181020)),
                       # Controles
                       interval(ymd(20170112), ymd(20170116)),
                       interval(ymd(20170301), ymd(20170315)),
                       interval(ymd(20171201), ymd(20171231)),
                       interval(ymd(20180815), ymd(20180919)),
                       interval(ymd(20181001), ymd(20181021))))

And I want to obtain this result

id  flag            time              value
1   1   2017-01-01 UTC--2017-01-07 UTC  NA
1   0   2018-01-01 UTC--2019-01-01 UTC  0
1   0   2017-01-03 UTC--2017-01-09 UTC  1
2   1   2017-01-01 UTC--2017-01-15 UTC  NA
2   1   2018-07-01 UTC--2018-09-01 UTC  NA
2   1   2018-10-12 UTC--2018-10-20 UTC  NA
2   0   2017-01-12 UTC--2017-01-16 UTC  1
2   0   2017-03-01 UTC--2017-03-15 UTC  0
2   0   2017-12-01 UTC--2017-12-31 UTC  0
2   0   2018-08-15 UTC--2018-09-19 UTC  1
2   0   2018-10-01 UTC--2018-10-21 UTC  1

This is, I want to compare the time intervals of flag = 0 to all possible flag = 1, within each group, to see if there is at least one time overlap between flag 0 and flag 1

For these purpose I have tried with lubridate int_overlaps function

I have tried the following code but does not work:

result <- df %>%
  group_by(id) %>%
  mutate(value = ifelse(flag == 0 & int_overlaps(time, any(time[flag == 1])), 1, 0))

I have found a very similar approach:

R: Determine if each date interval overlaps with all other date intervals in a dataframe

torakxkz
  • 483
  • 5
  • 17

2 Answers2

1

You can use map_int from purrr to see if any intervals overlap within each id:

library(tidyverse)
library(lubridate)

df %>%
  group_by(id) %>%
  mutate(value = ifelse(flag == 0, map_int(time, ~ any(int_overlaps(.x, time[flag == 1]))), NA))

Output

# A tibble: 11 x 4
# Groups:   id [2]
      id  flag time                           value
   <dbl> <dbl> <Interval>                     <int>
 1     1     1 2017-01-01 UTC--2017-01-07 UTC    NA
 2     1     0 2018-01-01 UTC--2019-01-01 UTC     0
 3     1     0 2017-01-03 UTC--2017-01-09 UTC     1
 4     2     1 2017-01-01 UTC--2017-01-15 UTC    NA
 5     2     1 2018-07-01 UTC--2018-09-01 UTC    NA
 6     2     1 2018-10-12 UTC--2018-10-20 UTC    NA
 7     2     0 2017-01-12 UTC--2017-01-16 UTC     1
 8     2     0 2017-03-01 UTC--2017-03-15 UTC     0
 9     2     0 2017-12-01 UTC--2017-12-31 UTC     0
10     2     0 2018-08-15 UTC--2018-09-19 UTC     1
11     2     0 2018-10-01 UTC--2018-10-21 UTC     1
Ben
  • 28,684
  • 5
  • 23
  • 45
0

I add another answer extracted from here:

R: Determine if each date interval overlaps with all other date intervals in a dataframe

result <- df %>% group_by(id) %>%
  mutate(value = map(seq_along(time), function(x){
         y = setdiff(seq_along(time[flag == 1]), x)
          return(any(int_overlaps(time[x], time[y])))
            }))
torakxkz
  • 483
  • 5
  • 17