Considering the following data frame
Value
time
2020-02-14 14:16:10.769999872+00:00 74
2020-02-14 14:16:11.360999936+00:00 74
2020-02-14 14:16:11.970000128+00:00 72
2020-02-14 14:16:12.637000192+00:00 72
2020-02-14 14:16:13.210000128+00:00 74
... ...
2020-02-28 08:15:20.340000+00:00 71
2020-02-28 08:15:20.890000128+00:00 71
2020-02-28 08:15:21.424000+00:00 71
2020-02-28 08:15:22.032999936+00:00 72
2020-02-28 08:15:22.594000128+00:00 72
I would like my code to go through the Values, find the start index and end index of each value and save this information into a dictionary.
results = {74: {start:2020-02-14 14:16:10.769999872+00:00, end:2020-02-14 14:16:11.360999936+00:00},
72: {start: ..., end: ...},
...}
Because this would be to simple, the tricky part is that one or more values may appear multiple times in a non consecutive way:
74, 74, 72, 72, 72, 74, 74, 74, 71, 71, 71, 72, 72, 71, 71
.
If this is the case, then for each Value a new sequence should be generated which contains the start and end index.
results = {74:
{Sequence1: {start:2020-02-14 14:16:10.769999872+00:00, end:2020-02-14 14:16:11.360999936+00:00},
Sequence2: {start: ... , end: ...}},
72:
{Sequence1: {start: ..., end: ...},
Seqeunce2: {start: ..., end: ...},
Sequence3: {start: ..., end: ...}},
71: ...,
}
Naturally I can code this with lots of for-loops but I was wondering if there might be a more neat and clever solution that could spare me the pfaff. And maybe most important of all it is crucial that the code works fast. The data frame has around 300.000 rows.