5

I have a data set that looks like this:

ID   |   DATE    | SCORE
-------------------------
123  |  1/15/10  |  10
123  |  1/1/10   |  15
124  |  3/5/10   |  20
124  |  1/5/10   |  30
...

So to load the above snippet as a data frame, the code is:

id<-c(123,123,124,124)
date<-as.Date(c('2010-01-15','2010-01-01','2010-03-05','2010-01-05'))
score<-c(10,15,20,30)
data<-data.frame(id,date,score)


I'm trying to add a column that calculates the "days since last record for this ID".

Right now I'm using a FOR loop that looks something like this:

data$dayssincelast <- rep(NA, nrow(data))
for(i in 2:nrow(data)) {
  if(data$id[i] == data$id[i-1]) 
    data$dayssincelast[i] <- data$date[i] - data$date[i-1]
}


Is there a faster way to do this? (I've looked a bit into APPLY but can't quite figure out a solution besides a FOR loop.)

Thanks in advance!

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Dave Guarino
  • 509
  • 5
  • 14
  • 2
    Please add to your question the output of `dput(head(data))`. Your dates don't look like something you can subtract – GSee Nov 27 '12 at 19:54
  • 1
    There are many ways to approach the split-apply piece, but all of them will probably end up using `diff`. – joran Nov 27 '12 at 19:56
  • @GSee - I did not show it, but I converted the dates already using as.Date(). The above is just dummy data to illustrate the structure. – Dave Guarino Nov 27 '12 at 22:21
  • @Dave, you'll get better Answers if you make your Questions [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – GSee Nov 27 '12 at 23:26
  • Thank you, @GSee - I've edited the question to make it reproducible. (I'm new to R on SO, so appreciate the pointer! :D ) – Dave Guarino Nov 28 '12 at 02:29

3 Answers3

5

This should work if your the dates are in order within id.

id<-c(123,123,124,124)
date<-as.Date(c('2010-01-15','2010-01-01','2010-03-05','2010-01-05'))
score<-c(10,15,20,30)
data<-data.frame(id,date,score)

data <- data[order(data$id,data$date),]
data$dayssincelast<-do.call(c,by(data$date,data$id,function(x) c(NA,diff(x))))
# Or, even more concisely
data$dayssincelast<-unlist(by(data$date,data$id,function(x) c(NA,diff(x))))
nograpes
  • 18,623
  • 1
  • 44
  • 67
0

How does the following work for you?

 indx <- which(data$id == c(data$id[-1], NA))
 data$date[indx] - data$date[indx+1]



This just shifts the id's by 1 and compares them to id to check for neighboring matches.
Then for the dat subtraction, simply subtract the matches from the date of the subsequent row.

Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
0

In the case where you need a more complex formula, you can use aggregate:

a <- aggregate(date ~ id, data=data, FUN=function(x) c(NA,diff(x)))
data$dayssincelast <- c(t(a[-1]), recursive=TRUE) # Remove 'id' column

The same sort order applies here as in @nograpes answer.

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112