15

Have a time series(ts) indexed by DatatimeIndex, want to group it by 10 minutes

index   x  y  z

ts1     ....
ts2     ....
...

I know how to group by 1 minute

def group_by_minute(timestamp):
    year = timestamp.year
    month = timestamp.month
    day = timestamp.day
    hour = timestamp.hour
    minute = timestamp.minute
    return datetime.datetime(year, month, day, hour, minute)

then

ts.groupby(group_by_minute, axis=0)

my customized function (roughly)

def my_function(group):
    first_latitude = group['latitude'].sort_index().head(1).values[0]
    last_longitude = group['longitude'].sort_index().tail(1).values[0]
    return first_latitude - last_longitude

so the ts DataFrame should definitely contains 'latitude' and 'longitude' columns

When using TimeGrouper

   ts.groupby(pd.TimeGrouper(freq='100min')).apply(my_function)

I got the following errors,

TypeError: cannot concatenate a non-NDFrame object
Hello lad
  • 17,344
  • 46
  • 127
  • 200
  • Have you tried `resample`? E.g. `df.resample('1min', 'mean')` What aggregation are you doing – JoeCondron Aug 21 '15 at 19:28
  • 1
    @JoeCondron I am applying customized functions with APPLY function. It seems to me that resample or TimeGrouper fills in the gap automatically, even there is a time gap of one year. Is there a way to prevent from this ? Thanks a lot – Hello lad Aug 21 '15 at 19:33
  • You can pass your custom function like: `df.resample('10min', how=my_func)`. It won't fill gaps unless you tell it to. Maybe you should post the function you want to pass and desired output. Alternatively, you can adjust the last line of your function to `minute = 10 * (minute / 10)`. – JoeCondron Aug 21 '15 at 19:44
  • @JoeCondron thanks for the suggestion. I have switched to resample and it almost works. Only resample takes the first column of df, does it apply to multiple columns of df at the same time ? I would reedit my function into the question. thx again – Hello lad Aug 21 '15 at 19:47

2 Answers2

19

There is a pandas.TimeGrouper for this sort of thing, what you described would be some thing like:

agg_10m = df.groupby(pd.TimeGrouper(freq='10Min')).aggregate(numpy.sum) #or other function
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • 3
    thx for the response. It seems that pd.TimeGrouper does exist, but isn't documented here http://pandas.pydata.org/pandas-docs/stable/api.html – Hello lad Aug 21 '15 at 19:52
  • 1
    oops, you are right. Never notice it is undocumented. – CT Zhu Aug 21 '15 at 19:57
  • get an "TypeError: cannot concatenate a non-NDFrame object" by applying TimeGrouper – Hello lad Aug 21 '15 at 20:16
  • 2
    TimeGrouper is sort of documented -- it's in the cookbook http://pandas.pydata.org/pandas-docs/stable/cookbook.html#resampling That's weird it doesn't seem to be covered outside of that though. – JohnE Aug 21 '15 at 22:40
  • 1
    http://pandas.pydata.org/pandas-docs/stable/groupby.html#grouping-with-a-grouper-specification is the canonical method for time grouping; this creates a TimeGrouper which is not public per se – Jeff Aug 22 '15 at 01:31
  • Just leaving a note here that `TimeGrouper` is now deprecated, and [`Grouper`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Grouper.html) should be used instead – user5305519 Jul 17 '20 at 09:37
17

I know this is old but pd.Grouper() will also accomplish this:

agg_10m = df.groupby(pd.Grouper(freq='10Min')).aggregate(numpy.sum)
Andrew L
  • 6,618
  • 3
  • 26
  • 30