4

I have a simple pandas dataframe that has measurements at various times:

                     volume
t
2013-10-13 02:45:00      17
2013-10-13 05:40:00      38
2013-10-13 09:30:00      29
2013-10-13 11:40:00      25
2013-10-13 12:50:00      11
2013-10-13 15:00:00      17
2013-10-13 17:10:00      15
2013-10-13 18:20:00      12
2013-10-13 20:30:00      20
2013-10-14 03:45:00       9
2013-10-14 06:40:00      30
2013-10-14 09:40:00      43
2013-10-14 11:05:00      10

I'm doing some basic resampling and plotting, such as the daily total volume, which works fine:

df.resample('D',how='sum').head()   

            volume
t
2013-10-13     184
2013-10-14     209
2013-10-15     197
2013-10-16     309
2013-10-17     317

But for some reason when I try do the total number of entries per day, it returns a a multiindex series instead of a dataframe:

df.resample('D',how='count').head()

2013-10-13  volume     9
2013-10-14  volume     9
2013-10-15  volume     7
2013-10-16  volume     9
2013-10-17  volume    10

I can fix the data so it's easily plotted with a simple unstack call, i.e. df.resample('D',how='count').unstack(), but why does calling resample with how='count' have a different behavior than with how='sum'?

moustachio
  • 2,924
  • 3
  • 36
  • 68

1 Answers1

7

It does appear the resample and count leads to some odd behavior in terms of how the resulting dataframe is structured (Well, at least up to 0.13.1). See here for a slightly different but related context: Count and Resampling with a mutli-ndex

You can use the same strategy here:

>>> df
                     volume
date                       
2013-10-13 02:45:00      17
2013-10-13 05:40:00      38
2013-10-13 09:30:00      29
2013-10-13 11:40:00      25
2013-10-13 12:50:00      11
2013-10-13 15:00:00      17
2013-10-13 17:10:00      15
2013-10-13 18:20:00      12
2013-10-13 20:30:00      20
2013-10-14 03:45:00       9
2013-10-14 06:40:00      30
2013-10-14 09:40:00      43
2013-10-14 11:05:00      10

So here is your issue:

>>> df.resample('D',how='count')

2013-10-13  volume    9
2013-10-14  volume    4

You can fix the issue by specifying that count applies to the volume column with a dict in the resample call:

>>> df.resample('D',how={'volume':'count'})

            volume
date              
2013-10-13       9
2013-10-14       4
Community
  • 1
  • 1
Karl D.
  • 13,332
  • 5
  • 56
  • 38
  • So I guess we can assume this is a bug. Nice workaround, though. – moustachio May 16 '14 at 13:35
  • is there a simple way to achieve the same result with multiple columns? For me this works but it is a lot of typing with more columns: df.resample('D',how={'volume':'count', 'open':'count', 'high':'count'}) – Markus W Aug 03 '15 at 12:11
  • 1
    `how` parameter is now deprecated, use this: https://stackoverflow.com/questions/41408082/modify-ohlc-resample-code-as-per-deprecated-warning – Álvaro Loza Oct 24 '17 at 14:28