I can't figure out how to plot frequency against sequencial durations in R. I have several processes(X,Y,Z,...) that consist of multiple steps (a,b,c,d,...). Each process has a different sequence of steps. So, X may consist of aabaacabcd, Y may consist of acdad, etc. There are time stamps for beginning and end of every step, which I used to calculate each steps duration. The data frame looks something like this:
ProcessID Step Start End Seconds
X a 30.09.2022 14:08 30.09.2022 14:11 165
X d 30.09.2022 14:11 30.09.2022 14:24 756
Y a 29.09.2022 11:55 29.09.2022 13:16 4876
Y c 29.09.2022 13:16 29.09.2022 14:26 4199
Y d 29.09.2022 14:26 30.09.2022 17:17 96654
There are around 1000 processes in the data frame. At the beginning of each process there's step 'a' which may appear more often throughout the sequence. 'd' only exists once in each sequence and closes the process. I plotted frequency for 'seconds' for each step in a ridgeline plot, which looks like this: ridgeline plot
This, however, treats 'seconds' as one data point rather than a duration and also doesn't take into consideration that step b, c, d etc usually (not always) follow step a.
I would like to plot frequency against a timeline which consideres the frequency of step a/b/c/d at a certain time point. For example, at time point 20 seconds on the timeline there are 32 processes in step a, 49 processes are at step c...etc. I do struggle with defining the durations. 'Seconds' only gives me one time point and not the sequence of steps. 'Start' and 'End' does define sequence and duration but are not "normed" to all start at 0. So I don't know how to compare.
Can anyone help? Thanks in advance