2

All of the lag example I see use a continuous time series. I am trying to calculate a percent change by year, however, it would not make sense for me to calculate if there is a gab in between. i.e. I would not want a percent change from 2001 to 2004. Only interested in between two years. Example of data input:

structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
    Year = c(2000L, 2001L, 2004L, 2005L, 2006L, 2007L, 1990L, 
    2000L, 2001L, 2005L, 2006L, 2007L, 2009L), Value = c(4L, 
    10L, 7L, 4L, 7L, 5L, 2L, 7L, 10L, 6L, 9L, 2L, 9L)), .Names = c("ID", 
"Year", "Value"), class = "data.frame", row.names = c(NA, -13L
))

df <-  df %>%  group_by(ID) %>%
  mutate(delta = (Value-lag(Value))/lag(Value))

The line above does not return my desired output, ignoring places that jump. Desired output:

structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
    Year = c(2000L, 2001L, 2004L, 2005L, 2006L, 2007L, 1990L, 
    2000L, 2001L, 2005L, 2006L, 2007L, 2009L), Value = c(4L, 
    10L, 7L, 4L, 7L, 5L, 2L, 7L, 10L, 6L, 9L, 2L, 9L), Change = c(NA, 
    1.5, NA, -0.428571429, 0.75, -0.285714286, NA, 2.5, 0.428571429, 
    NA, 0.5, -0.777777778, NA)), .Names = c("ID", "Year", "Value", 
"Change"), class = "data.frame", row.names = c(NA, -13L))
hhhaaa1
  • 35
  • 1
  • 6

4 Answers4

6

using dplyr:

df %>%  group_by(ID) %>%
    mutate(delta =  ifelse((Year - lag(Year)) > 1, NA, (Value-lag(Value))/lag(Value)))
Sumedh
  • 4,835
  • 2
  • 17
  • 32
3

Here is a data.table solution:

# load library and convert to data.table
library(data.table)
setDT(df)

df[, "Change" := ifelse(Year-shift(Year)==1, 
   (Value-shift(Value))/shift(Value), NA), by="ID"]

which returns

df
    ID Year Value     Change
 1:  A 2000     4         NA
 2:  A 2001    10  1.5000000
 3:  A 2004     7         NA
 4:  A 2005     4 -0.4285714
 5:  A 2006     7  0.7500000
 6:  A 2007     5 -0.2857143
 7:  B 1990     2         NA
 8:  B 2000     7         NA
 9:  B 2001    10  0.4285714
10:  B 2005     6         NA
11:  B 2006     9  0.5000000
12:  B 2007     2 -0.7777778
13:  B 2009     9         NA

This uses ifelse, which can be slow with enormous data sets, but if the data set is in the thousands of observations size, this won't be noticeable.

lmo
  • 37,904
  • 9
  • 56
  • 69
3

Here a possible solution using the diff function.

library(dplyr)
df <-  df %>%  group_by(ID) %>%
  mutate(delta = (Value-lag(Value))/lag(Value))

#find the difference between each row
yeardiff<-c(0,diff(df$Year) )
#for any row with a difference not equal to one set to NA
df$delta[yeardiff !=1]<-NA
Dave2e
  • 22,192
  • 18
  • 42
  • 50
2

We can just use base R functions to get the output

lv <- with(df, ave(Value, ID, FUN = function(x) c(NA, x[-length(x)])))
ly <- with(df, ave(Year, ID, FUN = function(x) c(NA, x[-length(x)])))
df$Change <- with(df, ifelse((Year -ly) >1, NA, (Value - lv)/lv))
df$Change
#[1]         NA  1.5000000         NA -0.4285714  0.7500000 
#[6] -0.2857143         NA         NA  0.4285714         NA  
#[11] 0.5000000 -0.7777778         NA
akrun
  • 874,273
  • 37
  • 540
  • 662