1

I found this description of how to resample a multi-index:

Resampling Within a Pandas MultiIndex

However as soon as I use count instead of sum the solution is not working any longer

This might be related to: Resampling with 'how=count' causing problems

Not working count and strings:

values_a =[1]*16
states = ['Georgia']*8 + ['Alabama']*8
#cities = ['Atlanta']*4 + ['Savanna']*4 + ['Mobile']*4 + ['Montgomery']*4
dates = pd.DatetimeIndex([datetime.datetime(2012,1,1)+datetime.timedelta(days = i) for i in range(4)]*4)
df2 = pd.DataFrame(
    {'value_a': values_a},
    index = [states, dates])
df2.index.names = ['State', 'Date']
df2.reset_index(level=[0], inplace=True)
print(df2.groupby(['State']).resample('W',how='count'))

Yields:

         2012-01-01           2012-01-08         
              State  value_a       State  value_a
State                                            
Alabama           2        2           6        6
Georgia           2        2           6        6

The working version with sum and numbers as values

values_a =[1]*16
states = ['Georgia']*8 + ['Alabama']*8
#cities = ['Atlanta']*4 + ['Savanna']*4 + ['Mobile']*4 + ['Montgomery']*4
dates = pd.DatetimeIndex([datetime.datetime(2012,1,1)+datetime.timedelta(days = i) for i in range(4)]*4)
df2 = pd.DataFrame(
    {'value_a': values_a},
    index = [states, dates])
df2.index.names = ['State', 'Date']
df2.reset_index(level=[0], inplace=True)
print(df2.groupby(['State']).resample('W',how='sum'))

Yields (notice no duplication of 'State'):

                    value_a
State   Date               
Alabama 2012-01-01        2
        2012-01-08        6
Georgia 2012-01-01        2
        2012-01-08        6
Community
  • 1
  • 1
Cilvic
  • 3,417
  • 2
  • 33
  • 57

2 Answers2

1

When using count, state isn't a nuisance column (it can count strings) so the resample is going to apply count to it (although the output is not what I would expect). You could do something like (tell it only to apply count to value_a),

>>> print df2.groupby(['State']).resample('W',how={'value_a':'count'})

                    value_a
State   Date               
Alabama 2012-01-01        2
        2012-01-08        6
Georgia 2012-01-01        2
        2012-01-08        6

Or more generally, you can apply different kinds of how to different columns:

>>> print df2.groupby(['State']).resample('W',how={'value_a':'count','State':'last'})

                      State  value_a
State   Date                        
Alabama 2012-01-01  Alabama        2
        2012-01-08  Alabama        6
Georgia 2012-01-01  Georgia        2
        2012-01-08  Georgia        6

So while the above allows you to count a resampled multi-index dataframe it doesn't explain the behavior of output fromhow='count'. The following is closer to the way I would expect it to behave:

print df2.groupby(['State']).resample('W',how={'value_a':'count','State':'count'})

                   State  value_a
State   Date                      
Alabama 2012-01-01      2        2
        2012-01-08      6        6
Georgia 2012-01-01      2        2
        2012-01-08      6        6
Karl D.
  • 13,332
  • 5
  • 56
  • 38
  • @ Karl D. Do you want to post the answer here as well: http://stackoverflow.com/questions/23688355/how-can-i-count-a-resampled-multi-indexed-dataframe-in-pandas/23689595?noredirect=1#23689595 – Cilvic May 16 '14 at 04:34
  • That link just takes me to this question. – Karl D. May 16 '14 at 04:39
  • Sorry I meant this one: http://stackoverflow.com/questions/19745656/resampling-with-how-count-causing-problems – Cilvic May 16 '14 at 04:45
  • Yeah, that does appear to be related. – Karl D. May 16 '14 at 04:47
1

@Karl D soln is correct; this will be possible in 0.14/master (releasing shortly), see docs here

In [118]: df2.groupby([pd.Grouper(level='Date',freq='W'),'State']).count()
Out[118]: 
                    value_a
Date       State           
2012-01-01 Alabama        2
           Georgia        2
2012-01-08 Alabama        6
           Georgia        6

Prior to 0.14 it was difficult to groupby / resample with a time based grouper and another grouper. pd.Grouper allows a very flexible specification to do this.

Jeff
  • 125,376
  • 21
  • 220
  • 187