All of the lag example I see use a continuous time series. I am trying to calculate a percent change by year, however, it would not make sense for me to calculate if there is a gab in between. i.e. I would not want a percent change from 2001 to 2004. Only interested in between two years. Example of data input:
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Year = c(2000L, 2001L, 2004L, 2005L, 2006L, 2007L, 1990L,
2000L, 2001L, 2005L, 2006L, 2007L, 2009L), Value = c(4L,
10L, 7L, 4L, 7L, 5L, 2L, 7L, 10L, 6L, 9L, 2L, 9L)), .Names = c("ID",
"Year", "Value"), class = "data.frame", row.names = c(NA, -13L
))
df <- df %>% group_by(ID) %>%
mutate(delta = (Value-lag(Value))/lag(Value))
The line above does not return my desired output, ignoring places that jump. Desired output:
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Year = c(2000L, 2001L, 2004L, 2005L, 2006L, 2007L, 1990L,
2000L, 2001L, 2005L, 2006L, 2007L, 2009L), Value = c(4L,
10L, 7L, 4L, 7L, 5L, 2L, 7L, 10L, 6L, 9L, 2L, 9L), Change = c(NA,
1.5, NA, -0.428571429, 0.75, -0.285714286, NA, 2.5, 0.428571429,
NA, 0.5, -0.777777778, NA)), .Names = c("ID", "Year", "Value",
"Change"), class = "data.frame", row.names = c(NA, -13L))