3

I have a data frame with thousands of line looking like this:

    time                type      value
0   09:30:01.405735     EVENT_0     2.1
0   09:30:01.405761     EVENT_0     2.1
0   09:30:01.419743     EVENT_0     1.1
1   09:30:02.419769     EVENT_0     32.1
2   09:30:02.419775     EVENT_0     2.15
3   09:30:02.419775     EVENT_0     24.1
4   09:30:06.419775     EVENT_0     3.1
5   09:30:06.419793     EVENT_0     1.1
6   09:30:06.419793     EVENT_0     2.4
....     

We define a "window" as a continuous list of events that are not separated by more than 1 second (that is, a gap of 1 second or more between two consecutive events create a new window)

Here we would have 3 windows:

    time                type      value
0   09:30:01.405735     EVENT_0     2.1
0   09:30:01.405761     EVENT_0     2.1
0   09:30:01.419743     EVENT_0     1.1

====================================

1   09:30:02.419769     EVENT_0     32.1
2   09:30:02.419775     EVENT_0     2.15
3   09:30:02.419775     EVENT_0     24.1

====================================

4   09:30:06.419775     EVENT_0     3.1
5   09:30:06.419793     EVENT_0     1.1
6   09:30:06.419793     EVENT_0     2.4
....     

I have trying to find a way to compute the average of the "value" column for each window, but can't find a way to do it properly in pandas.

Thomas K
  • 39,200
  • 7
  • 84
  • 86
quantdev
  • 23,517
  • 5
  • 55
  • 88
  • Did you manage to create the windows ? How do you define two elements in the same window ? Not separated by more than 1 second **or** belonging to the same second ? – MMF Oct 06 '16 at 15:20
  • @MMF : no I did not, in fact this is the main question, how to create those windows using pandas – quantdev Oct 06 '16 at 15:33
  • So, how do you define two elements in the same window ? Not separated by more than 1 second or belonging to the same second ? – MMF Oct 06 '16 at 15:42

1 Answers1

4

Assuming you time column is of datetime format and the data frame is sorted according to the time column:

# calculate the windows, gives a unique number per entry associating it to its respective window
windows = (data.time.diff().apply(lambda x: x.total_seconds()) >= 1).astype(int).cumsum()
# group by these windows and compute the value mean
data.groupby(windows).value.mean()
languitar
  • 6,554
  • 2
  • 37
  • 62
  • This is a modification of the approach presented here: https://stackoverflow.com/questions/14358567/finding-consecutive-segments-in-a-pandas-data-frame – languitar Oct 06 '16 at 15:37