I have a dataframe something like this:
Timestamp | Value | Type |
---|---|---|
2021-07-03 15:12:00 | 1 | 2.a |
2021-07-03 16:11:00 | 1 | 2.a |
2021-07-04 14:25:00 | 1 | 2.a |
2021-07-04 15:50:00 | 1 | 2.a |
2021-07-04 17:07:00 | 2 | 2.c |
2021-07-04 18:06:00 | 2 | 2.c |
2021-07-04 20:14:00 | 3 | 2.a |
2021-07-05 00:00:00 | 3 | 2.a |
df = pd.DataFrame({'Timestamp': ['2021-07-03 15:12:00',
'2021-07-03 16:11:00',
'2021-07-04 14:25:00',
'2021-07-04 15:50:00',
'2021-07-04 17:07:00',
'2021-07-04 18:06:00',
'2021-07-04 20:14:00',
'2021-07-05 00:00:00'],
'Value': [1, 1, 1, 1, 2, 2, 3, 3],
'Type': ['2.a', '2.a', '2.a', '2.a', '2.c', '2.c', '2.a', '2.a']})
Where TimeStamp a DatetimeIndex, Value is some integer, and Type is some string. Every two rows represents a start/stop time pair, e.g. 2021-07-03 15:12:00 is the start of some occurrence, and 2021-07-03 16:11:00 is the end of the same occurrence. The Value and Type should be identical information for each pair, but will vary between pairs. Nothing makes this Value and Type pair unique from each other.
I'm looking for a way to downsample or squash each two rows into a single row so I'd end up with something like:
Start | End | Value | Type |
---|---|---|---|
2021-07-03 15:12:00 | 2021-07-03 16:11:00 | 1 | 2.a |
2021-07-04 14:25:00 | 2021-07-04 15:50:00 | 1 | 2.a |
2021-07-04 17:07:00 | 2021-07-04 18:06:00 | 2 | 2.c |
2021-07-04 20:14:00 | 2021-07-05 00:00:00 | 3 | 2.a |
Where Start and End would remain pandas Timestamps, and the Value and Type of one of the pairs' rows can be tossed/remove/ignored, and the other is used as the value.
Is there any pandas way of doing this or should I be looping through rows myself to do something like this?
The closest post I've found is here but doesn't the solution assume the data set has a unique Name and Month by using groupby
? In my dataset there is nothing unique about the Value and Type combination.
I also considered aggregate but couldn't manage to figure out how to do this particular operation.