2

I am attempting to write a for loop which will take subsets of a dataframe by person id and then lag the EXAMDATE variable by one for comparison. So a given row will have the original EXAMDATE and also a variable EXAMDATE_LAG which will contain the value of the EXAMDATE one row before it.

for (i in length(uniquerid))
{
    temp <- subset(part2test, RID==uniquerid[i])
    temp$EXAMDATE_LAG <- temp$EXAMDATE
    temp2 <- data.frame(lag(temp, -1, na.pad=TRUE))  
    temp3 <- data.frame(cbind(temp,temp2))
}

It seems that I am creating the new variable just fine but I know that the lag won't work properly because I am missing steps. Perhaps I have also misunderstood other peoples' examples on how to use the lag function?

csgillespie
  • 59,189
  • 14
  • 150
  • 185
Rlearner
  • 21
  • 1
  • Could you please provide some example data (`part2test`)? – Sven Hohenstein Sep 14 '12 at 18:31
  • 1
    This sounds like a good case for `plyr` or `data.table`. `library(plyr); ddply(part2test, .(uniquerid), transform, EXAMDATE_LAG=lag(EXAMDATE, -1, na.pad=TRUE))` or something like that. – Justin Sep 14 '12 at 18:44
  • 1
    Don't use the lag() function. Time-series function in R are difficult to understand. If this is a Date-class variable just subtract 1 or shift with c(temp$EXAMDATE[-1], NA), depending on what you mean.. – IRTFM Sep 14 '12 at 19:14
  • +1 to @Dwin 's comment. I'd use something like my code and replace the `lag(...)` with his version. – Justin Sep 14 '12 at 19:18
  • You are overwriting temp each iteration, it isn't really itereating because there's no sequence. – Luciano Selzer Sep 14 '12 at 19:26

1 Answers1

1

So that this can be fully answered. There are a handful of things wrong with your code. Lucaino has pointed one out. Each time through your loop you are going to create temp, temp2, and temp3 (or overwrite the old one). and thus you'll be left with only the output of the last time through the loop.

However, this isnt something that needs a loop. Instead you can make use of the vectorized nature of R

x <- 1:10

> c(x[-1], NA)
 [1]  2  3  4  5  6  7  8  9 10 NA

So if you combine that notion with a library like plyr that splits data nicely you should have a workable solution. If I've missed something or this doesn't solve your problem, please provide a reproducible example.

library(plyr)
myLag <- function(x) {
  c(x[-1], NA)
}

ddply(part2test, .(uniquerid), transform, EXAMDATE_LAG=myLag(EXAMDATE))

You could also do this in base R using split or the data.table package using its by= argument.

Community
  • 1
  • 1
Justin
  • 42,475
  • 9
  • 93
  • 111