0

I have a dataset with repeated measures for individuals. There are two date variables for each observation: date of entry and date of exit. If the time between date of last exit and date of next entry is less or equal to 30 then it is registered as an event.

What I want to do is keep track of how many events an individual has experienced in the last 365 days when his measurements are recorded again. I'm having a hard time creating this decay.

Below is an example dataset with an incorrect solution:

library(tidyverse)
library(lubridate)

tib_ex <- tibble(
  id = c(1, 1, 1, 1,
         2, 2, 2, 2, 2),
  date_in = ymd(c('2008-07-31', '2008-08-29', '2008-09-15', '2009-05-05', 
                  '2010-08-03', '2010-08-29', '2010-09-25', 
                  '2011-09-11', '2011-12-12')),
  date_out = ymd(c('2008-08-08', '2008-09-01', '2009-03-16', '2009-05-14', 
                   '2010-08-20', '2010-09-01', '2010-11-07',
                   '2011-11-25', '2011-12-16'))

)

tib_ex <- tib_ex %>%
  group_by(id) %>%
  mutate(time_between = as.numeric(date_in - lag(date_out)),
         time_state = as.numeric(date_out - date_in),
         return_30 = ifelse(time_between <= 30, 1, 0), 
         time_between = ifelse(is.na(time_between), 0, time_between),
         return_30 = ifelse(is.na(return_30), 0, return_30), 
         cum_time = cumsum(time_between) + cumsum(time_state))


tib_ex %>%
  group_by(id) %>%
  mutate(count = ifelse(date_in - lag(date_in, 1, default = 0) <= 365,
                        cumsum(return_30), 0))

Which creates the following tibble:

# A tibble: 9 x 8
# Groups:   id [2]
     id date_in    date_out   time_between time_state return_30 cum_time count
  <dbl> <date>     <date>            <dbl>      <dbl>     <dbl>    <dbl> <dbl>
1     1 2008-07-31 2008-08-08            0          8         0        8     0
2     1 2008-08-29 2008-09-01           21          3         1       32     1
3     1 2008-09-15 2009-03-16           14        182         1      228     2
4     1 2009-05-05 2009-05-14           50          9         0      287     2
5     2 2010-08-03 2010-08-20            0         17         0       17     0
6     2 2010-08-29 2010-09-01            9          3         1       29     1
7     2 2010-09-25 2010-11-07           24         43         1       96     2
8     2 2011-09-11 2011-11-25          308         75         0      479     2
9     2 2011-12-12 2011-12-16           17          4         1      500     3

Observations with id 1 have the correct count as the cumulative time never exceeds 365. The last two observations for id 2 have the incorrect count as they should 1 and 1 rather than 2 and 3 (because of decay).

docjay
  • 725
  • 2
  • 9
  • 17

1 Answers1

0

I found an answer to this question on StackExchange:

R: RunningTotal in the last 365 days window by Name

The following solution works:

tib_ex <- tib_ex %>%
  group_by (id) %>%
  arrange(date_in) %>% 
  mutate(day = date_in - date_in[1])

f <- Vectorize(function(i)
  sum(tib_ex[tib_ex$id[i] == tib_ex$id & tib_ex$day[i] - tib_ex$day >= 0 & 
           tib_ex$day[i] - tib_ex$day <= 365, "return_30"]), vec="i")
tib_ex$RunningTotal365 <- f(1:nrow(tib_ex))

It's basically copy/paste from the answer in the link above.

docjay
  • 725
  • 2
  • 9
  • 17