My data.frame
is structured as follows:
Prod.order Operation Timestamp
1: 126 B100 2022-08-15 10:30:00
2: 126 B100 2022-08-15 10:33:00
3: 126 B100 2022-08-15 10:35:00
4: 126 D200 2022-08-15 10:40:00
5: 126 D200 2022-08-15 10:45:00
6: 126 B100 2022-08-15 10:50:00
7: 126 B100 2022-08-15 10:52:00
8: 126 B100 2022-08-15 10:55:00
Prod.order
is an id variable. For each consecutive run of a particular Operation
, I want to find the earliest and latest Timestamp
. In the above case, my desired output is:
Prod.order Operation earliest latest
1: 126 B100 2022-08-15 10:30:00 2022-08-15 10:35:00
2: 126 D200 2022-08-15 10:40:00 2022-08-15 10:45:00
3: 126 B100_2 2022-08-15 10:50:00 2022-08-15 10:55:00
Note that because there were two runs of consecutive B100
s (separated by a run of consecutive D200
s, B100
shows up here twice (with the second pass denoted by _2
).
My actual data.frame
has a similar set-up, but for multiple values of Prod.order
(see bottom of post). Given this, I need to (1) detect the different consecutive runs of Operation
s within each Prod.order
, and (2) find the earliest and latest Timestamp
per Operation
per Prod.order
.
Example data below: df_current
is raw data, df_desired
is desired output.
p.s. solution that can be piped are extra awesome xD
Thanks!
df_current <- data.frame(
Prod.order = c(rep(123, 23), rep(124, 4), rep(125, 3), rep(126, 8)),
Operation = c(
rep('B100', 6), rep('D200', 6), rep('B100', 5), rep('D300', 6),
rep('B100', 4),
rep('B100', 1), rep('D200', 1), rep('D300', 1),
rep('B100', 3), rep('D200', 2), rep('B100', 3)),
Timestamp = as.POSIXct(c(
"2022-08-15 10:00:00", "2022-08-15 10:01:00", "2022-08-15 10:02:00", "2022-08-15 10:03:00", "2022-08-15 10:04:00", "2022-08-15 10:05:00",
"2022-08-15 10:10:00", "2022-08-15 10:11:00", "2022-08-15 10:12:00", "2022-08-15 10:13:00", "2022-08-15 10:14:00", "2022-08-15 10:15:00",
"2022-08-15 10:20:00", "2022-08-15 10:21:00", "2022-08-15 10:23:00", "2022-08-15 10:24:00", "2022-08-15 10:25:00",
"2022-08-15 10:30:00", "2022-08-15 10:31:00", "2022-08-15 10:32:00", "2022-08-15 10:33:00", "2022-08-15 10:34:00", "2022-08-15 10:35:00",
"2022-08-15 10:10:00", "2022-08-15 10:12:00", "2022-08-15 10:14:00", "2022-08-15 10:15:00",
"2022-08-15 10:20:00",
"2022-08-15 10:30:00",
"2022-08-15 10:40:00",
"2022-08-15 10:30:00", "2022-08-15 10:33:00", "2022-08-15 10:35:00",
"2022-08-15 10:40:00", "2022-08-15 10:45:00",
"2022-08-15 10:50:00", "2022-08-15 10:52:00", "2022-08-15 10:55:00"),
format= "%Y-%m-%d %H:%M:%S")
)
df_desired <- data.frame(
Prod.order = c(rep(123,4), 124, rep(125,3), rep(126,3)),
operation = c(
'B100', 'D200', 'B100_2', 'D300',
'B100',
'B100', 'D200', 'D300',
'B100', 'D200', 'B100_2'),
earliest = as.POSIXct(c(
"2022-08-15 10:00:00", "2022-08-15 10:10:00", "2022-08-15 10:20:00", "2022-08-15 10:30:00",
"2022-08-15 10:10:00",
"2022-08-15 10:20:00", "2022-08-15 10:30:00", "2022-08-15 10:40:00",
"2022-08-15 10:30:00", "2022-08-15 10:40:00", "2022-08-15 10:50:00"),
format= "%Y-%m-%d %H:%M:%S"),
latest = as.POSIXct(c(
"2022-08-15 10:05:00", "2022-08-15 10:15:00", "2022-08-15 10:25:00", "2022-08-15 10:35:00",
"2022-08-15 10:15:00",
"2022-08-15 10:20:00", "2022-08-15 10:30:00", "2022-08-15 10:40:00",
"2022-08-15 10:35:00", "2022-08-15 10:45:00", "2022-08-15 10:55:00"),
format= "%Y-%m-%d %H:%M:%S")
)