0

Below is a table that I have in R. (event log)

Case.ID | Activity | Timestamp |    Resource   |   State
------------------------------------------------------------
   0    |Take order| 00:12:04  |     Waiter    |   Assign
------------------------------------------------------------
   0    |Take order| 00:18:02  |               |  Complete
------------------------------------------------------------
   1    |Bring food| 00:47:23  |  Cook helper  |   Assign
------------------------------------------------------------
   1    |Bring food| 00:52:41  |               |  Complete
------------------------------------------------------------
   1    |Bring food| 00:54:52  | Cook helper   |   Assign
------------------------------------------------------------
   1    |Bring food| 00:59:11  |               |  Complete

Rows with value in the column Resource is a start of the activity and end of the activity has no value in that cell.

I tried,

assign <- subset(csv, select=c(Case.ID,Activity,Timestamp,State), State=="assign")
complete <- subset(csv, select=c(Case.ID,Activity,Timestamp,State), State=="complete")
merge <- merge(assign, complete, by=c("Case.ID", "Activity"))

but it shows some error like below.

Second and fourth rows should be eliminated but not sure the way to do it.

Case.ID | Activity | Start.Timestamp | End.Timestamp |
------------------------------------------------------
   1    |Bring food|    00:47:23     |    00:52:41   |
------------------------------------------------------
   1    |Bring food|    00:47:23     |    00:59:11   | 
------------------------------------------------------
   1    |Bring food|    00:54:52     |    00:52:41   |
------------------------------------------------------
   1    |Bring food|    00:54:52     |    00:59:11   |
Edward M.
  • 221
  • 1
  • 5
  • 13

1 Answers1

0

Create an extra grouping variable and then reshape to wide format like this:

df$grp <- cumsum(df$Resource!='')

library(reshape2)
df2 <- dcast(df, Case.ID + Activity + grp ~ State, value.var = 'Timestamp')[,-3]

which gives:

> df2
  Case.ID   Activity   assign complete
1       0 Take order 00:12:04 00:18:02
2       1 Bring food 00:47:23 00:52:41
3       1 Bring food 00:54:52 00:59:11

An alternative with data.table:

library(data.table)
df2 <- dcast(setDT(df)[, grp := cumsum(Resource!='')], 
             Case.ID + Activity + grp ~ State, value.var = 'Timestamp')[, grp := NULL][]
Jaap
  • 81,064
  • 34
  • 182
  • 193