1

I have a dataframe with let's say 2 columns: dates and doubles

 2017-05-01   2.5
 2017-05-02   3.5
 ...          ...
 2017-05-17   0.2
 2017-05-18   2.5

Now I would like to do a groupby and sum with x rows. So i.e. with 6 rows it would return:

 2017-05-06  15.6
 2017-05-12  13.4
 2017-05-18  18.0

Is there a clean solution to do this without running it through a for-loop with something like this:

 temp = pd.DataFrame()
 j = 0
 for i in range(0,len(df.index),6):
      temp[df.ix[i]['date']] = df.ix[i:i+6]['value'].sum()
user3605780
  • 6,542
  • 13
  • 42
  • 67

2 Answers2

3

I guess you are looking for resample. consider this dataframe

rng = pd.date_range('2017-05-01', periods=18, freq='D')
num = np.random.randint(5,size = 18)
df = pd.DataFrame({'date': rng, 'val': num})

df.resample('6D', on = 'date').sum().reset_index()

will return

    date        val
0   2017-05-01  14
1   2017-05-07  11
2   2017-05-13  16
Vaishali
  • 37,545
  • 5
  • 58
  • 86
  • I accepted before but when I tried your example I get the error: TypeError: resample() got an unexpected keyword argument 'on' args = ("resample() got an unexpected keyword argument 'on'",) with_traceback = – user3605780 May 18 '17 at 14:39
  • I noticed I was using pandas 0.17, after upgrading it did work. – user3605780 May 18 '17 at 14:55
  • I am sorry I missed both the comments, glad that it worked out though – Vaishali May 18 '17 at 17:27
1

This is alternative solution using groupby range of length of the dataframe.

Two columns using agg

df.groupby(np.arange(len(df))//6).agg(lambda x: {'date': x.date.iloc[0], 
                                                 'value': x.value.sum()})

Multiple columns you can use first (or last) for date and sum for other columns.

group = df.groupby(np.arange(len(df))//6)
pd.concat((group['date'].first(), 
           group[[c for c in df.columns if c != 'date']].sum()), axis=1)
titipata
  • 5,321
  • 3
  • 35
  • 59
  • Thanks for the extra, but how does that work when having more then 1 value columns, so a groupby would be better then x.value.sum()? – user3605780 May 18 '17 at 08:44
  • I'm not sure that I completely understand your question. This is basically group the Dataframe in chunk of 6 rows and aggregate those rows. I'm not sure if it's going to be better than plain summation. – titipata May 18 '17 at 08:49
  • Well now you have 'value' : x.value.sum() , but if there are 12 columns in the dataframe this means you have to write this for every column and with groupby this is for all at once. – user3605780 May 18 '17 at 14:40
  • Ah, I see! I will make my answer more general :) – titipata May 18 '17 at 17:23
  • @user3605780, I updated my solution. Hope it's more general now. – titipata May 18 '17 at 21:27