Pandas - Grouping rows in time buckets

Question

I have a data frame with thousands of line looking like this:

    time                type      value
0   09:30:01.405735     EVENT_0     2.1
0   09:30:01.405761     EVENT_0     2.1
0   09:30:01.419743     EVENT_0     1.1
1   09:30:02.419769     EVENT_0     32.1
2   09:30:02.419775     EVENT_0     2.15
3   09:30:02.419775     EVENT_0     24.1
4   09:30:06.419775     EVENT_0     3.1
5   09:30:06.419793     EVENT_0     1.1
6   09:30:06.419793     EVENT_0     2.4
....

We define a "window" as a continuous list of events that are not separated by more than 1 second (that is, a gap of 1 second or more between two consecutive events create a new window)

Here we would have 3 windows:

    time                type      value
0   09:30:01.405735     EVENT_0     2.1
0   09:30:01.405761     EVENT_0     2.1
0   09:30:01.419743     EVENT_0     1.1

====================================

1   09:30:02.419769     EVENT_0     32.1
2   09:30:02.419775     EVENT_0     2.15
3   09:30:02.419775     EVENT_0     24.1

====================================

4   09:30:06.419775     EVENT_0     3.1
5   09:30:06.419793     EVENT_0     1.1
6   09:30:06.419793     EVENT_0     2.4
....

I have trying to find a way to compute the average of the "value" column for each window, but can't find a way to do it properly in pandas.

Did you manage to create the windows ? How do you define two elements in the same window ? Not separated by more than 1 second **or** belonging to the same second ? — MMF, Oct 06 '16 at 15:20
@MMF : no I did not, in fact this is the main question, how to create those windows using pandas — quantdev, Oct 06 '16 at 15:33
So, how do you define two elements in the same window ? Not separated by more than 1 second or belonging to the same second ? — MMF, Oct 06 '16 at 15:42

score 4 · Accepted Answer · answered Oct 06 '16 at 15:34

4

Assuming you time column is of datetime format and the data frame is sorted according to the time column:

# calculate the windows, gives a unique number per entry associating it to its respective window
windows = (data.time.diff().apply(lambda x: x.total_seconds()) >= 1).astype(int).cumsum()
# group by these windows and compute the value mean
data.groupby(windows).value.mean()

answered Oct 06 '16 at 15:34

languitar

6,554
2
37
62

This is a modification of the approach presented here: https://stackoverflow.com/questions/14358567/finding-consecutive-segments-in-a-pandas-data-frame – languitar Oct 06 '16 at 15:37

Pandas - Grouping rows in time buckets

1 Answers1