58

In R, let's say I have this data frame:

Data
id      date        value
2380    10/30/12    21.01
2380    10/31/12    22.04
2380    11/1/12     22.65
2380    11/2/12     23.11
20100   10/30/12    35.21
20100   10/31/12    37.07
20100   11/1/12     38.17
20100   11/2/12     38.97
20103   10/30/12    57.98
20103   10/31/12    60.83 

And I want to subtract the previous value from the current value, by group ID date, to create this:

id      date        value   diff
2380    10/30/12    21.01   0
2380    10/31/12    22.04   1.03
2380    11/1/12     22.65   0.61
2380    11/2/12     23.11   0.46
20100   10/30/12    35.21   0
20100   10/31/12    37.07   1.86
20100   11/1/12     38.17   1.1
20100   11/2/12     38.97   0.8
20103   10/30/12    57.98   0
20103   10/31/12    60.83   2.85
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
haitham
  • 3,398
  • 6
  • 21
  • 35

3 Answers3

105

With dplyr:

library(dplyr)

data %>%
    group_by(id) %>%
    arrange(date) %>%
    mutate(diff = value - lag(value, default = first(value)))

For clarity you can arrange by date and grouping column (as per comment by lawyer)

data %>%
    group_by(id) %>%
    arrange(date, .by_group = TRUE) %>%
    mutate(diff = value - lag(value, default = first(value)))

or lag with order_by:

data %>%
    group_by(id) %>%
    mutate(diff = value - lag(value, default = first(value), order_by = date))

With data.table:

library(data.table)

dt <- as.data.table(data)
setkey(dt, id, date)
dt[, diff := value - shift(value, fill = first(value)), by = id]
Community
  • 1
  • 1
zero323
  • 322,348
  • 103
  • 959
  • 935
  • 1
    Hello @zero323, I am trying to apply your solution to the same problem but somehow I get only zeros in the "diff" column. Any idea about what could happen? – Lucas May 05 '17 at 17:10
  • @Lucas Only item per group would be my first guess. – zero323 May 06 '17 at 08:55
  • The dplyr solution worked for me only when I modified the third line to arrange(date, .by_group = TRUE). The second argument arranges within the groups (id in the example data). Otherwise the dates were arranged from top to bottom for all the dates. – lawyeR Oct 29 '17 at 16:18
  • Expanding on @lawyeR s comment, you can also do arrange(id, date) – digestivee May 22 '18 at 08:59
  • This piece could be mutate to perform other calculation, such as division data %>% group_by(id) %>% arrange(date, .by_group = TRUE) %>% mutate(diff = value / lag(value, default = first(value))) – Little Bee Mar 10 '19 at 18:25
26

You can do this with the ave function:

data$diff <- ave(data$value, data$id, FUN=function(x) c(0, diff(x)))
data
#       id                date value diff
# 1   2380 2012-10-30 00:15:51 21.01 0.00
# 2   2380 2012-10-31 00:31:03 22.04 1.03
# 3   2380 2012-11-01 00:16:02 22.65 0.61
# 4   2380 2012-11-02 00:15:32 23.11 0.46
# 5  20100 2012-10-30 00:15:38 35.21 0.00
# 6  20100 2012-10-31 00:15:48 37.07 1.86
# 7  20100 2012-11-01 00:15:49 38.17 1.10
# 8  20100 2012-11-02 00:15:19 38.97 0.80
# 9  20103 2012-10-30 10:27:34 57.98 0.00
# 10 20103 2012-10-31 12:24:42 60.83 2.85

The first argument is the data to be operated on, the second argument is the group, and the last argument is the function to be applied to the data from each group.

josliber
  • 43,891
  • 12
  • 98
  • 133
  • Hi, I know it has been long since this answer was posted, but this works great! Do you mind explaining what is being done here? – asterx Oct 13 '16 at 18:33
  • @asterx As the last sentence in the answer states, this takes the value variable, groups it by the id variable, and then calls the passed function on the data for each group. In this case, the passed function returns 0 followed by the difference between each consecutive value within the group. – josliber Oct 13 '16 at 18:35
  • 2
    Thanks! ```diff(x)``` basically returns a vector 1 shorter than its input which is combined with 0 as the heading element to make it the same size. Got it. – asterx Oct 13 '16 at 20:26
  • All these answers are awesome. – Unnikrishnan Feb 23 '21 at 07:32
0

Awesome answers! Just wanted to add that if you want to make your data consecutively to work with the code above you can do that with order, e.g.:

data <- data[with{data, order(id, date)), ]
data$diff <- ave(data$value, data$id, FUN=function(x) c(0, diff(x)))

See: Calculate difference between values in rows by two grouping variables

Kim
  • 80
  • 10