I am trying to loop over a dataframe and find sequences of events between a start and stop object (an event that occurs at both the beginning and end).
Here is some sample data:
time = c('8:20', '8:19', '8:15', '8:14', '8:14', '8:10', '8:04', '8:03', '8:00', '7:59', '7:55', '7:44', '7:43','7:42')
action = c('A', 'B', 'C', 'B', 'F', 'T', 'Z', 'U', 'A', 'G', 'B', 'C', 'L', 'Z')
group = c('group1', 'group1', 'group1', 'group2', 'group1', 'group1', 'group2', 'group2','group2', 'group2', 'group2', 'group2', 'group1', 'group1')
test.df = cbind(time, action, group) %>% data.frame()
The full data set is longer and wider, but this should suffice.
The rules are, that if one group (either group1 or group2) registers action 'A' and only 'A', it starts the sequence of the run. Any number of events can occur following that, until the opposite group (group2 if group1 initiated 'A', or group 1 if it's the reverse) logs action 'Z'. Action 'Z' by the opposite group signifies the 'end' point of the sequence.
This process iterates hundreds of time over the dataframe.
Each time one of the group starts action 'A', I want every subsequent event to be linked with an ID value that sums for each time the group starts a new sequence over the dataframe, until action 'Z' is taken by the opposite group.
I.E., in the above sample, there would be a new column identifying that it was 'group1' to which the sequence belongs to and this is ID 1, and their next sequence that initiates later in the data set would be ID 2 for group 1, etc.
time action group group.sequence id
8:20 A group1 group1 1
8:19 B group1 group1 1
8:15 C group1 group1 1
8:14 B group2 group1 1
8:14 F group1 group1 1
[...]
That way, summation on time, # of actions in between, types of actions in between can be found. Any actions that occur outside the 'A' to 'Z' actions of a group (example, row 8) can be ignored for now.
Prefer something I can use in my dplyr pipe, but open to any solutions that achieve success.