1

Whenever I want to lag in a data frame I realize that something that should be simple is not. While the problem has been asked & answered many times (see p.s.), I did not find a simple solution which I can remember until the next time I lag. In general, lagging does not seem to be a simple thing in R as the multiple workarounds testify. I run into this problem often and it would be very helpful to have some basic R solutions which do not need extra packages. Could you provide your simple solution for lagging?

If that is not possible, could you at least provide your workaround here so we can choose amongst second best alternatives? One collection already exists here

Also, in all blog posts on this subject I see people complain about how unexpectedly difficult lagging is so how can we get a simple lag function for data frames into R Core? This must be extremely disappointing for anyone coming from Stata or EViews. Or am I missing something and there is a simple built in solution?

say we want to lag "value" by 3 "year"s for each "country" here:

Data <- data.frame(year=c(rep(2010:2015,2)),country=c(rep("AT",6),rep("DE",6)),value=rnorm(12))

to create L3 like:

 year country   value    L3
 2010      AT  0.3407    NA
 2011      AT -1.7981    NA
 2012      AT -0.8390    NA
 2013      AT -0.6888    0.3407
 2014      AT -1.1019   -1.7981
 2015      AT -0.8953   -0.8390
 2010      DE  0.5877    NA
 2011      DE -1.0204    NA
 2012      DE -0.6576    NA
 2013      DE  0.6620    0.5877
 2014      DE  0.9579   -1.0204
 2015      DE -0.7774   -0.6576

And we neither want to change the nature of our data (to ts or data table) nor do we want to immerse ourselves in three new packages when the deadline is tonight and our supervisor uses Stata and thinks lagging is easy ;-) (its not, I just want to be prepared...)

p.s.:

without groups

with data.table: Lag in dataframe or How to create a lag variable within each group?

time series are straightforward

Community
  • 1
  • 1
Jakob
  • 1,325
  • 15
  • 31
  • You could try something like `lag <- 3; Data <- cbind(Data, L3=ave(Data$value, Data$country, FUN=function(x) c(rep(NA,lag),head(x, n=-lag))))` . This uses only base R but is may be a bit more complicated than what you were looking for. – WaltS Dec 10 '15 at 15:49

2 Answers2

1

Try slide from data combine package, its simple
slide(Data,Var='value',GroupVar = 'country',slideBy=-3)

user_flow
  • 179
  • 1
  • 11
1

If the question is how to provide a column with the prior third year's value not using packages then try this:

prior_year3 <- function(x, k = 3) head(c(rep(NA, k), x), length(x))
transform(Data, prior_year_value = ave(value, country, FUN = prior_year3))

giving:

   year country       value prior_year_value
1  2010      AT -1.66562121               NA
2  2011      AT -0.04950063               NA
3  2012      AT  1.55930293               NA
4  2013      AT -0.40462394      -1.66562121
5  2014      AT  0.78602610      -0.04950063
6  2015      AT  0.73912916       1.55930293
7  2010      DE  1.03710539               NA
8  2011      DE -1.13370942               NA
9  2012      DE -1.20530981               NA
10 2013      DE  1.66870572       1.03710539
11 2014      DE  1.53615793      -1.13370942
12 2015      DE -0.09693335      -1.20530981

That said, to use R effectively you do need to learn how to use the key packages.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks! One comment: "That said, to use R effectively you do need to learn how to use the key packages." - of course, but lagging should be simpler than that. I also don't download packages for subtraction and addition. – Jakob Dec 11 '15 at 11:59
  • I think the expectation is that if you use lag you will do it on time series objects (and lag is available and does work on ts objects in the core) so if you want to use lag on anything else you are not in mainstream usage and should use a package. – G. Grothendieck Dec 11 '15 at 12:41