Calculating the difference between consecutive rows by group using dplyr?

Question

I have a dataframe of ids and timestamps. I'd like to calculate the difference between each sequential timestamp for an individual id.

My dataframe looks like this:

id  time
Alpha   1
Alpha   4
Alpha   7
Beta    5
Beta    10

I'm trying to add a column like time.difference below:

id  time    time.difference
Alpha   1   NA
Alpha   4   3
Alpha   7   4
Beta    5   NA
Beta    10  5

Is there a clean way to do this using dplyr? (or tidyr or something else that's easier to read than vanilla R?)

score 34 · Accepted Answer · answered Jul 11 '15 at 22:25

34

Like this:

dat %>% 
  group_by(id) %>% 
  mutate(time.difference = time - lag(time))

answered Jul 11 '15 at 22:25

bergant

4

One thing to add to this answer (after a long wait). This works because the data is already sorted by id and time in the example. If it wasn't sorted, `lag` would not work reliably. A more robust solution would add `arrange(time) %>%` between the `group_by' and the `mutate` functions. That would ensure that the data is sorted properly for the `lag` function – hank_044 May 20 '19 at 19:41
It does not seem to work either if you group by several columns. – jjmerelo Mar 22 '20 at 12:11
@hank_044 I believe you need to arrange by id, not time. At least then it works for me. It should read `arrange(id) %>%` – David Apr 27 '21 at 23:17

Veerendra Gadekar · Answer 2 · 2015-07-12T00:24:52.550

7

using data.table

library(data.table)
library(dplyr)
setDT(dat)[, time.difference := time - lag(time, 1L), by = id]

edited Jul 12 '15 at 00:24

answered Jul 11 '15 at 23:40

Veerendra Gadekar

2 Answers2