4

I am using Pandas Timegrouper to group datapoints in a pandas dataframe in python:

grouped = data.groupby(pd.TimeGrouper('30S'))

I would like to know if there's a way to achieve window overlap, like suggested in this question: Window overlap in Pandas while keeping the pandas dataframe as data structure.

Update: tested timing of the three solutions proposed below and the rolling mean seems faster:

%timeit df.groupby(pd.TimeGrouper('30s',closed='right')).mean()
%timeit df.resample('30s',how='mean',closed='right')
%timeit pd.rolling_mean(df,window=30).iloc[29::30]

yields:

1000 loops, best of 3: 336 µs per loop
1000 loops, best of 3: 349 µs per loop
1000 loops, best of 3: 199 µs per loop
Community
  • 1
  • 1
Fra
  • 4,918
  • 7
  • 33
  • 50

1 Answers1

4

Create some data exactly 3 x 30s long

In [51]: df = DataFrame(randn(90,2),columns=list('AB'),index=date_range('20130101 9:01:01',freq='s',periods=90))

Using a TimeGrouper in this way is equivalent of resample (and that's what resample actually does) Note that I used closed to make sure that exactly 30 observations are included

In [57]: df.groupby(pd.TimeGrouper('30s',closed='right')).mean()
Out[57]: 
                            A         B
2013-01-01 09:01:00 -0.214968 -0.162200
2013-01-01 09:01:30 -0.090708 -0.021484
2013-01-01 09:02:00 -0.160335 -0.135074

In [52]: df.resample('30s',how='mean',closed='right')
Out[52]: 
                            A         B
2013-01-01 09:01:00 -0.214968 -0.162200
2013-01-01 09:01:30 -0.090708 -0.021484
2013-01-01 09:02:00 -0.160335 -0.135074

This is also equivalent if you then pick out the 30s intervals

In [55]: pd.rolling_mean(df,window=30).iloc[28:40]
Out[55]: 
                            A         B
2013-01-01 09:01:29       NaN       NaN
2013-01-01 09:01:30 -0.214968 -0.162200
2013-01-01 09:01:31 -0.150401 -0.180492
2013-01-01 09:01:32 -0.160755 -0.142534
2013-01-01 09:01:33 -0.114918 -0.181424
2013-01-01 09:01:34 -0.098945 -0.221110
2013-01-01 09:01:35 -0.052450 -0.169884
2013-01-01 09:01:36 -0.011172 -0.185132
2013-01-01 09:01:37  0.100843 -0.178179
2013-01-01 09:01:38  0.062554 -0.097637
2013-01-01 09:01:39  0.048834 -0.065808
2013-01-01 09:01:40  0.003585 -0.059181

So depending on what you want to achieve, its easy to do an overlap, by using rolling_mean and then pick out whatever 'frequency' you want. Eg here is a 5s resample with a 30s interval.

In [61]: pd.rolling_mean(df,window=30)[9::5]
Out[61]: 
                            A         B
2013-01-01 09:01:10       NaN       NaN
2013-01-01 09:01:15       NaN       NaN
2013-01-01 09:01:20       NaN       NaN
2013-01-01 09:01:25       NaN       NaN
2013-01-01 09:01:30 -0.214968 -0.162200
2013-01-01 09:01:35 -0.052450 -0.169884
2013-01-01 09:01:40  0.003585 -0.059181
2013-01-01 09:01:45 -0.055886 -0.111228
2013-01-01 09:01:50 -0.110191 -0.045032
2013-01-01 09:01:55  0.093662 -0.036177
2013-01-01 09:02:00 -0.090708 -0.021484
2013-01-01 09:02:05 -0.286759  0.020365
2013-01-01 09:02:10 -0.273221 -0.073886
2013-01-01 09:02:15 -0.222720 -0.038865
2013-01-01 09:02:20 -0.175630  0.001389
2013-01-01 09:02:25 -0.301671 -0.025603
2013-01-01 09:02:30 -0.160335 -0.135074
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Thanks! what if you don't want to apply a rolling mean but an arbitrary function in rolling overlapped windows? The reason I used time grouper was to extract the groups, and then look at each group separately. – Fra Aug 23 '13 at 00:26
  • 2
    you can use any rolling function of rolling_apply with an arbitrary function – Jeff Aug 23 '13 at 01:11
  • 1
    Thanks! `rolling_apply` answers my question – Fra Aug 23 '13 at 02:30
  • What if I just want to return the overlapping groups for later processing? – Fra Sep 16 '13 at 18:55
  • my understanding is that TimeGrouper groups starting at the beginning of a minute or second, so for example, if I do TimeGrouper('30s') it will group points from 0-30s and points from 30-60s. What if I want to group points 15-45s? – Fra Sep 17 '13 at 23:41
  • see here and the cookbook. http://pandas.pydata.org/pandas-docs/dev/timeseries.html#up-and-downsampling; you also might be able to just tshift('15s') then resample at 30s freq. alternatively there is a loffset argument u can pass to TimeGrouper – Jeff Sep 18 '13 at 00:08
  • tshif('15s') is what I'm currently doing :-) Is TimeGrouper documented? I couldn't really understand how loffset worked. – Fra Sep 18 '13 at 21:26