7

I have a large pandas dataframe containing columns timestamp, name, and value

index    timestamp                     name   value
0        1999-12-31 23:59:59.000107    A      16
1        1999-12-31 23:59:59.000385    B      12
2        1999-12-31 23:59:59.000404    C      25 
3        1999-12-31 23:59:59.000704    B      15
4        1999-12-31 23:59:59.001281    A      300
5        1999-12-31 23:59:59.002211    C      20
6        1999-12-31 23:59:59.002367    C      3

I want to group by time buckets (say 20ms or 20 minutes) and name, and calculate the average value for each group.

What is the most efficient manner to do it?

volatile
  • 899
  • 10
  • 22

1 Answers1

16

You can use pd.Grouper, but it requires you to have the timestamps on the index. So you could try something like:

df.set_index('timestamp').groupby([pd.Grouper(freq='20Min'), 'name']).mean()
Gustavo Bezerra
  • 9,984
  • 4
  • 40
  • 48
  • Thanks, exactly what I was looking for – volatile Mar 10 '16 at 08:41
  • 2
    `pd.TimeGrouper` is now deprecated and the proper way to do this is to instead use `pd.Grouper`. – Daren Eiri Nov 06 '18 at 18:31
  • 2
    To see the specs for the `freq`, look here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases – Ben Dec 03 '19 at 04:41
  • Great answer. How about if we want to count the missing value of specific column after grouping. how can we count it – user2064809 Mar 25 '22 at 13:29
  • 1
    @user2064809 You probably need to use `isnull` and `groupby`. See this: https://stackoverflow.com/questions/46106954/using-isnull-and-groupby-on-a-pandas-dataframe – Gustavo Bezerra Mar 26 '22 at 00:48