0

Would like compute the elapsed time between action A and action X. There is different logic for other actions (B,C etc.) but they involve doing similar time differences.

Is there a way to use computed value for offset in shift? I have used shift in the past for fixed offset e.g. using lag to go back 12 rows within a group but can't figure how to incorporate conditional offset in shift

Here's a made up example:

x <- data.table(Case = c(1,1,1,1,2,2,2,3,3,3,3,3), Action = c("A","B","C","X","A","X","X","A","C","X","A","X"), ActionTime = parse_date_time(c("1/23/2020 12:55","1/26/20 3:23","1/28/2020 4:23","4/16/2020 17:50","1/25/2020 23:04","2/12/2020 17:50","2/13/2020 17:50","1/26/2020 3:23","2/18/2020 21:23","2/18/2020 21:27","3/15/2020 3:23","3/18/2020 21:27"), orders=c('mdy HM')))
setkeyv(x, c("Case", "ActionTime"))

    > x
    Case Action          ActionTime
 1:    1      A 2020-01-23 12:55:00
 2:    1      B 2020-01-26 03:23:00
 3:    1      C 2020-01-28 04:23:00
 4:    1      X 2020-04-16 17:50:00
 5:    2      A 2020-01-25 23:04:00
 6:    2      X 2020-02-12 17:50:00
 7:    2      X 2020-02-13 17:50:00
 8:    3      A 2020-01-26 03:23:00
 9:    3      C 2020-02-18 21:23:00
10:    3      X 2020-02-18 21:27:00
11:    3      A 2020-03-15 03:23:00
12:    3      X 2020-03-18 21:27:00

Case 1 is the simple case, Case 2 when Action X shows multiple times but I need the min value of time and Group 3 where A and X show multiple times within the same case.

I am looking to get XTime = the timestmp of the closest X after the occurrence of A within the case:

    Case Action          ActionTime          XTime
 1:    1      A 2020-01-23 12:55:00          2020-04-16 17:50:00
 2:    1      B 2020-01-26 03:23:00
 3:    1      C 2020-01-28 04:23:00
 4:    1      X 2020-04-16 17:50:00
 5:    2      A 2020-01-25 23:04:00          2020-02-12 17:50:00
 6:    2      X 2020-02-12 17:50:00
 7:    2      X 2020-02-13 17:50:00
 8:    3      A 2020-01-26 03:23:00          2020-02-18 21:27:00
 9:    3      C 2020-02-18 21:23:00
10:    3      X 2020-02-18 21:27:00
11:    3      A 2020-03-15 03:23:00          2020-03-18 21:27:00
12:    3      X 2020-03-18 21:27:00

Appreciate any help

Thanks

  • Welcome to SO! Your chances of getting an answer will greatly increase if you provide a simple, self-contained example of your problem. This would include toy input data, your code, any error messages, and expected output. [This post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) may be helpful. – Limey Jun 20 '20 at 13:57
  • Hi Murali, welcome to Stack Overflow. I agree with Limey that it's not clear what your expected output is. I edited your question to provide the data with better formatting. I would recommend to copy that table and [edit] your question with a second copy that provides your expected output. – Ian Campbell Jun 20 '20 at 14:59

1 Answers1

1

Here's an approach with a rolling join.

First, we subset the data on Action == "A" and Action == "X" and join the two subsets onto each other. We use on = c("Case","Time") to join on cases that are the same and then time. In data.table, you can only roll on the last join condition. We then use roll = Inf to roll forward. For some reason, the column you roll on is combined during the join, so we create and extra copy called InitialTime.

The rolling join rolls forward to all possible value in the positive direction, so then we subset by Case to the minimum Time for all combinations of Case and InitialTime.

library(data.table)
data[Action == "A",.(Case,Action,Time,InitialTime=Time)][
  data[Action == "X",], on = c("Case","Time"), roll = Inf][
    ,.SD[which.min(Time),.(XTime=Time)],by = .(Case,InitialTime)]
   Case         InitialTime               XTime
1:    1 2020-01-23 12:55:00 2020-04-16 17:50:00
2:    2 2020-01-25 23:04:00 2020-02-12 17:50:00
3:    3 2020-01-26 03:23:00 2020-02-18 21:27:00
4:    3 2020-03-15 03:23:00 2020-03-18 21:27:00

Sample Data

data <- structure(list(Case = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L), Action = structure(c(1L, 2L, 3L, 4L, 1L, 4L, 4L, 1L, 
3L, 4L, 1L, 4L), .Label = c("A", "B", "C", "X"), class = "factor"), 
    Time = structure(c(1579802100, 1580026980, 1580203380, 1587073800, 
    1580011440, 1581547800, 1581634200, 1580026980, 1582078980, 
    1582079220, 1584256980, 1584581220), class = c("POSIXct", 
    "POSIXt"), tzone = "")), row.names = c(NA, -12L), class = c("data.table", 
"data.frame"))
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • 1
    Another possibility may be `data[Action == "A", XTime := data[Action == "X"][.SD, on = c("Case","ActionTime"), x.ActionTime, roll = -Inf]]` – Henrik Jun 20 '20 at 16:00
  • Thank you. Found this very helpful. Introduced me to rolling joins, which is incredibly powerful – Murali Narayanaswamy Jun 20 '20 at 16:05
  • @MuraliNarayanaswamy Glad it helped you. Also note that there is `roll = "nearest"`, which comes in handy quite frequently. – Ian Campbell Jun 20 '20 at 16:06