0

I have a couple of columns in a data frame with times. I'm trying to calculate the difference in times in a new column, but I need to reset it to zero every time I encounter the start of a new pattern. Please see some sample data below.

Seq  ATime      RT
0    18:33:00   0
20   18:48:00   15
43   19:01:00   13
56   19:47:00   24
0    21:33:00   0
9    21:45:00   12
22   21:55:00   10
45   22:13:00   18
0    06:33:00   0
22   06:47:00   14
45   06:59:00   12
62   07:22:00   23
85   07:48:00   26

I'm using the following script to estimate the delta column. The Seq column is always increasing for each 'pattern'. In this sample each pattern's Seq starts from 0, but it may not be the case always.

dat_4$RT <- 0
for (i in 1:(NROW(dat_4$Seq)-1)) {
  if (dat_4$Seq[i+1] > dat_4$Seq[i]) {
    dat_4$RT[i+1] = (chron(times=dat_4$ATime[i+1]) - chron(times=dat_4$ATime[i]))*1440
  } else {
    dat_4$RT[i+1] = 0
  }
}

Although it works, it's not at all efficient. Sometimes the 'dat_4' dataframe will have roughly 4 million records and it takes almost 2.5 minutes just to process this step.

   user  system elapsed 
  96.86   54.07  150.99 

Any suggestion on how I can make it more efficient?

jesstme
  • 604
  • 2
  • 10
  • 25
Sujith
  • 15
  • 5
  • 2
    Make sure your example is [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). You code uses a `Seq` column but that doesn't seem to appear anywhere in the sample data. – MrFlick Aug 21 '17 at 20:14
  • Updated to include the `Seq` column. – Sujith Aug 21 '17 at 20:21
  • 1
    Your code also uses `ATime` and `DTime` columns which are not in sample data. – Eric Watt Aug 21 '17 at 20:29
  • `ATime` and `DTime` are basically times, just like the `Time` column. But, I've updated the code and data. – Sujith Aug 21 '17 at 20:34
  • You may use the [`cumsum()` idiom](https://stackoverflow.com/questions/5222061/how-to-partition-a-vector-into-groups-of-regular-consecutive-sequences) to create a grouping variable, and then use `ave` to calculate time `diff`erences within group: `ave(chron(times = d$Time), cumsum(c(1, diff(d$Seq) < 0)), FUN = function(x) c(0, diff(x)))` – Henrik Aug 21 '17 at 21:09

2 Answers2

0

You can do this using the (built-in) difftime function, as well as lag and mutate from dplyr:

library(dplyr)
dat_4 %>%
  mutate(RT = as.numeric(difftime(ATime, lag(ATime, default = ATime[1]), units = "min")),
         RT = RT * (Seq > lag(Seq, default = 0)))

The first step computes the difference (in minutes) between each pair, with the first starting at zero. The second turns all the cases where Seq isn't increasing to 0.

David Robinson
  • 77,383
  • 16
  • 167
  • 187
0

You can do this by first calculating RT for all rows, and then finding all of the rows that should be set to 0. This avoids the for loop, and likely could be faster.

dat_4$RT <- c(0, diff(chron(times=dat_4$ATime)) * 1440)
dat_4$RT[which(sign(diff(dat_4$Seq)) == -1) + 1] <- 0

The first line will do the diff using chron similar to how you did, but avoids doing this in a loop. The second line detects when Seq has decreased, and sets those rows to have a RT of zero.

Eric Watt
  • 3,180
  • 9
  • 21