Calculating time intervals for long data

Question

I'm looking at constructing a recurrent event survival analysis, and am struggling to work out how to calculate time intervals between events. My data is long format, with each row indicating a hospital episode and the age (in months) at the episode. My issue is that for the analysis I need to calculate time between episodes for each participant. I realize that this will probably involve simply looping/iterating across participants, but I can't figure out how to get the time between event n and n-1 within each participant.

I've found a previous question Date-time differences between rows in R that partly answers my question, but it doesn't give me any ideas of how to implement this for multiple events, for each participant, where the number of events differs per participant.

            [ID] [age_of_hosp]
    [1,] 3600001  872
    [2,] 3600001  874
    [3,] 3600001  868
    [4,] 3600001  882
    [5,] 3600001  873
    [6,] 3600001  870
    [7,] 3600001  869
    [8,] 3600001  562
    [9,] 3600001  871
   [10,] 3600001  873
   [11,] 3600001  885
   [12,] 3600001  868
   [13,] 3600001  852
   [14,] 3600001  887
   [15,] 3600001  885
   [16,] 3600001  887
   [17,] 3600001  853
   [18,] 3600001  617
   [19,] 3600001  885
   [20,] 3600001  874
   [21,] 3600001  617
   [22,] 3600001  871
   [23,] 3600001  851
   [24,] 3600002   NA
   [25,] 3600003   NA
   [26,] 3600004  865
   [27,] 3600005  655
   [28,] 3600005  667
   [29,] 3600005  656
   [30,] 3600005  664
   [31,] 3600006  814
   [32,] 3600006  821
   [33,] 3600006  821
   [34,] 3600006  755
   [35,] 3600006  813

Any advice or pointers would be great!

It's best to mark an answer accepted, both so that future Googlers can find it and to encourage those who spent the time to answer. Also if you have any questions to clarify an answer, ask away! — Benjamin, May 24 '19 at 15:36
A) Benjamin: It's considered poor form to ask for checkmarks. (B) the same thing could be said about your failure to upvote the question.) — IRTFM, May 29 '19 at 22:48

Benjamin · Answer 1 · 2019-05-23T19:50:04.093

If you're open to solutions using packages like tibble and dplyr from the popular tidyverse set of R packages, you might try this:

First, to recreate your data using the tribble function:

library(tibble)
ages <- tribble(
      ~id, ~age_of_hosp,
  3600001,          872,
  3600001,          874,
  3600001,          868,
  3600001,          882,
  3600001,          873,
  3600001,          870,
  3600001,          869,
  3600001,          562,
  3600001,          871,
  3600001,          873,
  3600001,          885,
  3600001,          868,
  3600001,          852,
  3600001,          887,
  3600001,          885,
  3600001,          887,
  3600001,          853,
  3600001,          617,
  3600001,          885,
  3600001,          874,
  3600001,          617,
  3600001,          871,
  3600001,          851,
  3600002,           NA,
  3600003,           NA,
  3600004,          865,
  3600005,          655,
  3600005,          667,
  3600005,          656,
  3600005,          664,
  3600006,          814,
  3600006,          821,
  3600006,          821,
  3600006,          755,
  3600006,          813
)

Then getting to work:

The function you're looking for below is just lag. It finds the previous value. And if you're not familiar with the pipe operator (%>%), it takes the results of the previous function and "pipes" it into the next.
First I filtered out NA records. Not sure what you wanted to do with those.
Then I arrange by ID and the age of the hospital, just in case they're not already arranged in that order.
Grouping by id ensures that when we use lag, we get the last record for that incident, and not just overall.
mutate modifies fields or creates new ones. Here I use it to create a last_incident_age field, and then turn right around and use that to get the time difference in months.
glimpse is just a nice way of looking at your resulting dataset. ;)

library(dplyr)
ages %>% 
  filter(!is.na(age_of_hosp)) %>% 
  arrange(id, age_of_hosp) %>% 
  group_by(id) %>% 
  mutate(
    last_incident_age = lag(age_of_hosp, 1, default = NA),
    months_since_last = age_of_hosp - last_incident_age
  ) %>% 
  glimpse()

score 0 · Answer 2 · answered Jul 13 '23 at 12:58

Another solution would be to use time_elapsed() from my package timeplyr.

NA values are automatically skipped over so no need to filter them out.

# Uncomment below line to install
# remotes::install_github("NicChr/timeplyr")
library(dplyr)
library(timeplyr)
ages <- ages %>%
  arrange(id, age_of_hosp)
ages %>%
  mutate(time_since_last = time_elapsed(age_of_hosp, time_by = 1),
         .by = id)
#> # A tibble: 35 x 3
#>         id age_of_hosp time_since_last
#>      <dbl>       <dbl>           <dbl>
#>  1 3600001         562              NA
#>  2 3600001         617              55
#>  3 3600001         617               0
#>  4 3600001         851             234
#>  5 3600001         852               1
#>  6 3600001         853               1
#>  7 3600001         868              15
#>  8 3600001         868               0
#>  9 3600001         869               1
#> 10 3600001         870               1
#> # i 25 more rows

^{Created on 2023-07-13 with reprex v2.0.2}

If you have a large number of groups, one can also use the g argument.

time_elapsed(ages$age_of_hosp, g = ages$id, time_by = 1)
 [1]  NA  55   0 234   1   1  15   0   1   1   1   0   1   1   0   1   0   8   3   0   0   2   0  NA  NA  NA
[27]  NA   1   8   3  NA  58   1   7   0

Calculating time intervals for long data

2 Answers2