20

I have a dataframe of ids and timestamps. I'd like to calculate the difference between each sequential timestamp for an individual id.

My dataframe looks like this:

id  time
Alpha   1
Alpha   4
Alpha   7
Beta    5
Beta    10

I'm trying to add a column like time.difference below:

id  time    time.difference
Alpha   1   NA
Alpha   4   3
Alpha   7   4
Beta    5   NA
Beta    10  5

Is there a clean way to do this using dplyr? (or tidyr or something else that's easier to read than vanilla R?)

Henrik
  • 65,555
  • 14
  • 143
  • 159
Thalecress
  • 3,231
  • 9
  • 35
  • 47

2 Answers2

34

Like this:

dat %>% 
  group_by(id) %>% 
  mutate(time.difference = time - lag(time))
bergant
  • 7,122
  • 1
  • 20
  • 24
  • 4
    One thing to add to this answer (after a long wait). This works because the data is already sorted by id and time in the example. If it wasn't sorted, `lag` would not work reliably. A more robust solution would add `arrange(time) %>%` between the `group_by' and the `mutate` functions. That would ensure that the data is sorted properly for the `lag` function – hank_044 May 20 '19 at 19:41
  • It does not seem to work either if you group by several columns. – jjmerelo Mar 22 '20 at 12:11
  • @hank_044 I believe you need to arrange by id, not time. At least then it works for me. It should read `arrange(id) %>%` – David Apr 27 '21 at 23:17
7

using data.table

library(data.table)
library(dplyr)
setDT(dat)[, time.difference := time - lag(time, 1L), by = id]
Veerendra Gadekar
  • 4,452
  • 19
  • 24