1

I have two columns that look like this:

user_id  timestamp
3162507 "2016-11-15 21:26:58" 
3162507 "2016-11-15 21:28:13"
3180468 "2016-11-15 21:28:58"
3180468 "2016-11-15 21:29:47"
3180479 "2016-11-15 21:31:22"
3180479 "2016-11-15 21:31:35" ...

I want to calculate the time elapsed between each activity. Currently, I'm doing it with a loop. But R loops are slow.

for (i in 1:nrow(df)){
  if (df$user_id[i] != df$user_id[i+1]){
    df$time[i] <- NA
  }else{
    df$time[i] <- difftime(df$timestamp[1+i],df$timestamp[i],units = "secs")
  }
}

Is there a better way to do it?

dank
  • 303
  • 4
  • 20

1 Answers1

1
library(data.table)
df <- setDT(df)

result <- df[,list( time = difftime(timestamp-c(timestamp[2:.N],NA)),units = "secs"), by = user_id]

Should work for successive differences

denis
  • 5,580
  • 1
  • 13
  • 40
  • I was about to update my example. Activities are variable could be 1 or 100. – dank Jan 18 '18 at 15:09
  • @italo Then, what difference do you want? Between all consecutive times or first and last? – Dan Jan 18 '18 at 15:10
  • @Lyngbakr Consecutive. Sotos found a duplicate question. https://stackoverflow.com/questions/48324212/how-to-calculate-timestamp-difference-in-a-loop-r – dank Jan 18 '18 at 15:12
  • 1
    i updated my answer to have consecutives differences – denis Jan 18 '18 at 15:17