11

I have the following DataFrame:

df = pd.DataFrame({
'Branch' : 'A A A A A B'.split(),
'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
DT.datetime(2013,1,1,13,0),
DT.datetime(2013,1,1,13,5),
DT.datetime(2013,10,1,20,0),
DT.datetime(2013,10,2,10,0),
DT.datetime(2013,12,2,12,0),                                      
DT.datetime(2013,12,2,14,0),
]})

from pandas.tseries.resample import TimeGrouper

How can I group this data by the Branch and on a 20 day period using TimeGrouper?

All my previous attempts failed, because I could not combine TimeGrouper with another argument in the groupby function.

I would deeply appreciate your help.

Thank you

Andy

Andy
  • 9,483
  • 12
  • 38
  • 39
  • Applying over a DataFrameGroupby from TestGrouper doesn't seem to work at all e.g. `g = df1.groupby(pd.TimeGrouper(freq='20D')); g.apply(testgr);` :s Perhaps worth posting a github [issue](https://github.com/pydata/pandas/issues/new). – Andy Hayden Jun 07 '13 at 11:44
  • Hi Andy, have you added the line: from pandas.tseries.resample import TimeGrouper. I updated my example accordingly – Andy Jun 07 '13 at 12:59
  • Yeah, I can access the function, the problem is the groupby doesn't have .group methods etc. – Andy Hayden Jun 07 '13 at 13:08

2 Answers2

17

You can now use a TimeGrouper with another column (as of IIRC pandas version 0.14):

In [11]: df1 = df.set_index('Date')

In [12]: g = df1.groupby([pd.TimeGrouper('20D'), 'Branch'])

In [13]: g.sum()
Out[13]:
                            Quantity
Date                Branch
2013-01-01 13:00:00 A              4
2013-09-18 13:00:00 A             13
2013-11-17 13:00:00 A              9
                    B              3
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
5

From the discussion here: https://github.com/pydata/pandas/issues/3791

In [38]: df.set_index('Date').groupby(pd.TimeGrouper('6M')).apply(lambda x: x.groupby('Branch').sum())
Out[38]: 
                   Quantity
           Branch          
2013-01-31 A              4
2014-01-31 A             22
           B              3

And a bit more complicated question

In [55]: def testf(df):
   ....:     if (df['Buyer'] == 'Mark').sum() > 0:
   ....:         return Series(dict(quantity = df['Quantity'].sum(), buyer = 'mark'))
   ....:     return Series(dict(quantity = df['Quantity'].sum()*100, buyer = 'other'))
   ....: 

In [56]: df.set_index('Date').groupby(pd.TimeGrouper('6M')).apply(lambda x: x.groupby('Branch').apply(testf))
Out[56]: 
                   buyer quantity
           Branch                
2013-01-31 A        mark        4
2014-01-31 A       other     2200
           B       other      300
Jeff
  • 125,376
  • 21
  • 220
  • 187