pandas rolling groupby timeseries data

Question

I have found a few related questions but none seem to be doing the trick. I want a similar implementation to this but with the pandas dataframe structure. Below I create sample data from the entire 2016 year, which has 366 rows.

import pandas as pd
import numpy as np   
dates=pd.date_range('2016-01-01','2016-12-31')    
random_data=np.random.randn(len(dates))
data=pd.DataFrame(random_data,index=dates,columns=['Test'])

I would like to use groupby to get the next 5 days of data every 2 days. Normal groupby does not have overlapping timframes; putting in a groupby of 2 days will give me 183 (366/2) groups that have two days data. Putting a groupby of 5 days will give me around 74 (366/5) groups that have 5 days each. I would like 183 groups that have five days each.

Sorry in advance if this is not clear. Here is what I want:

            Test
2016-02-08  1.073696
2016-02-09  1.169865
2016-02-10  1.421454
2016-02-11 -0.576036
2016-02-12 -1.066921

            Test
2016-02-10  1.421454
2016-02-11 -0.576036
2016-02-12 -1.066921
2016-02-13  2.639681
2016-02-14 -0.261616

This is what I get with data.groupby(pd.TimeGrouper('2d'))

            Test
2016-02-08  1.073696
2016-02-09  1.169865
            Test
2016-02-10  1.421454
2016-02-11 -0.576036
            Test
2016-02-12 -1.066921
2016-02-13  2.639681

This is what I get with data.groupby(pd.TimeGrouper('5d'))

            Test
2016-02-08  0.898029
2016-02-09 -0.905950
2016-02-10 -0.202483
2016-02-11  1.073696
2016-02-12  1.169865
                Test
2016-02-13  1.421454
2016-02-14 -0.576036
2016-02-15 -1.066921
2016-02-16  2.639681
2016-02-17 -0.261616

If any of those answers are helpful, feel free to up-vote them as well as @Psidom's answer below. If you think you are looking for something different, let me know and I'll open this back up. — piRSquared, Jan 23 '17 at 03:28
Thank you, I missed your question while googling. Is there a way to add more keywords like 'rolling' or 'groupby' so google your question up over the other ones I linked? — Bobe Kryant, Jan 23 '17 at 14:22
Your question does exactly that. You've added key words. When people google for something that would lead them to this question, Stackoverflow redirects that traffic to my question, all because I marked it as a duplicate. This is why its ok to ask duplicates. You asked an identical question but with different wording. In that sense, you've helped improve the site. — piRSquared, Jan 23 '17 at 14:35
Also, I've asked a bunch of numpy questions that got marked as duplicates because I can never think of the right words to search for to find what I'm looking for — piRSquared, Jan 23 '17 at 14:35

score 2 · Accepted Answer · answered Jan 23 '17 at 03:24

If the dates are normal sequence with difference of one day as your sample data shows, you can use index to pick up the rows. Start from every two rows and select five rows for each pick:

[data.iloc[i:(i+5)] for i in range(0, len(data), 2)]

#[                Test
# 2016-01-01  0.450173
# 2016-01-02 -0.496819
# 2016-01-03  0.270781
# 2016-01-04 -0.207634
# 2016-01-05  1.032061,                 
#                 Test
# 2016-01-03  0.270781
# 2016-01-04 -0.207634
# 2016-01-05  1.032061
# 2016-01-06 -0.470462
# 2016-01-07 -1.077634, ...]

If you added that to the other post, I'd up vote it there as well. — piRSquared, Jan 23 '17 at 03:47

pandas rolling groupby timeseries data

1 Answers1