Calculate difference between values in consecutive rows by group

Question

This is a my df (data.frame):

group value
1     10
1     20
1     25
2     5
2     10
2     15

I need to calculate difference between values in consecutive rows by group.

So, I need a that result.

group value diff
1     10    NA # because there is a no previous value
1     20    10 # value[2] - value[1]
1     25    5  # value[3] value[2]
2     5     NA # because group is changed
2     10    5  # value[5] - value[4]
2     15    5  # value[6] - value[5]

Although, I can handle this problem by using ddply, but it takes too much time. This is because I have a lot of groups in my df. (over 1,000,000 groups in my df)

Are there any other effective approaches to handle this problem?

score 136 · Accepted Answer · edited Jun 13 '18 at 16:53

136

The package data.table can do this fairly quickly, using the shift function.

require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame

df[ , diff := value - shift(value), by = group]    
#   group value diff
#1:     1    10   NA
#2:     1    20   10
#3:     1    25    5
#4:     2     5   NA
#5:     2    10    5
#6:     2    15    5
setDF(df) #if you want to convert back to old data.frame syntax

Or using the lag function in dplyr

df %>%
    group_by(group) %>%
    mutate(Diff = value - lag(value))
#   group value  Diff
#   <int> <int> <int>
# 1     1    10    NA
# 2     1    20    10
# 3     1    25     5
# 4     2     5    NA
# 5     2    10     5
# 6     2    15     5

For alternatives pre-data.table::shift and pre-dplyr::lag, see edits.

edited Jun 13 '18 at 16:53

Henrik

65,555
14
143
159

answered Feb 13 '13 at 04:56

Blue Magister

13,044
5
38
56

1

Any chance you know a ddply solution? I had been working at this for a while before I reasoned that might need a different function. . . – Jack Ryan Mar 07 '13 at 17:35
3

I assume it would be something like `ddply(df, .(group), transform, diff=c(NA,diff(value)))` – Blue Magister Mar 07 '13 at 18:21
how can you modify this to calculate percentage change? – Hardik Gupta Feb 26 '18 at 06:35
You could apply any function here. For example, if this is our function: `perc_change <- function(x, y) { return(x/y*100) }`, then we could call it like `df[ , perc_diff := perc_change(value, shift(value)), by = group]` – srctaha Aug 14 '18 at 06:45
How can you substitute the NA to the next value so `Diff` for row 1 will also be `10` as it is for row 2? – Manasi Shah Oct 03 '18 at 22:48
@ManasiShah: `library(dplyr)`, line 2: `df$Diff <- as.data.frame((df %>% group_by(group) %>% mutate(Diff = value - lag(value)))[ , 3])` line 3: `df$Diff[is.na(df$Diff)] <- unlist((df %>% group_by(group) %>%summarise(start.values = first(value)))[ , 2])` – InColorado Oct 27 '21 at 16:08

score 23 · Answer 2 · answered May 04 '15 at 20:42

23

You can use the base function ave() for this

df <- data.frame(group=rep(c(1,2),each=3),value=c(10,20,25,5,10,15))
df$diff <- ave(df$value, factor(df$group), FUN=function(x) c(NA,diff(x)))

which returns

  group value diff
1     1    10   NA
2     1    20   10
3     1    25    5
4     2     5   NA
5     2    10    5
6     2    15    5

answered May 04 '15 at 20:42

MrFlick

195,160
17
277
295

Related question: http://stackoverflow.com/questions/30378946/calculating-time-difference-by-id I found both threads to be helpful. For dates, you can use this: `df$diff <- ave(as.numeric(df$Datevalue), factor(df$group), FUN=function(x) c(NA,diff(x)))` Also, you can switch the position of the NA if you want the last row in the group to have the NA instead. `df$diff <- ave(as.numeric(df$Datevalue), factor(df$group), FUN=function(x) c(diff(x), NA))` – Brian D Nov 10 '16 at 20:25

Aditya Sihag · Answer 3 · 2013-02-13T12:31:19.143

4

try this with tapply

df$diff<-as.vector(unlist(tapply(df$value,df$group,FUN=function(x){ return (c(NA,diff(x)))})))

edited Feb 13 '13 at 12:31

answered Feb 13 '13 at 05:01

Aditya Sihag

5,057
4
32
43

3

this is what I had to do: `df$diff <- unlist(tapply(df$value,df$group, function(x) c(NA,diff(x))))` – Tyler Rinker Feb 13 '13 at 05:24

score 0 · Answer 4 · answered Jan 31 '23 at 09:53

0

Since dplyr 1.1.0, you can shorten the dplyr version with inline temporary grouping with .by:

mutate(df, diff = value - lag(value), .by = group)

answered Jan 31 '23 at 09:53

Maël

45,206
3
29
67

Calculate difference between values in consecutive rows by group

4 Answers4

Linked

Related