How to calculate the quartile statistics of a column using the groupby function?

Question

I have data in 1 min intervals, and I want to change the granularity to 5 mins, and calculate the basic data statistics using .groupby as such:

   df2 = df1.groupby(pd.Grouper(freq='5Min',closed='right',label='right')).agg({
                                        "value1":  "mean", "value2": "max",
                                        "value3": "quantile"})

I want to get quartile/quantile data as well, but can't assign specific quantile point. The default is 50th quantile. How do I get the 75th quantile for value3?

George · Answer 1 · 2022-08-11T19:46:32.110

1

You can use groupby.quantile function. You will be able to specify the exact quantile and even choose a type of interpolation. I'm not sure that it is possible to perform everything in one step. May be you may need to do it separately and then append a column with quartiles to a df.

Link to the docs: https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.quantile.html

edited Aug 11 '22 at 19:46

answered Aug 11 '22 at 19:45

George

21
6

yeah, i was confused because my 5 data points for quantile calculation will be those 1 min interval values. data collected every minute ==> I want to group the data by every 5 mins==> add the quantiles to a different columns – prof31 Aug 11 '22 at 20:05
there will be a quantile value for every 5 min interval, but im confused how to implement it. – prof31 Aug 11 '22 at 20:07
May be you need to group by 5 rows, and then perform other operations. Possible answer: https://stackoverflow.com/questions/46478518/groupby-dataframe-by-n-columns-or-n-rows – George Aug 11 '22 at 20:20

user5002062 · Accepted Answer · 2022-08-12T02:09:32.847

1

The values you pass to agg don't have to be strings: they can be other functions. You could define a custom function like

def q75(series):
    return series.quantile(0.75)

and then pass this to agg like

   df2 = df1.groupby(pd.Grouper(freq='5Min',closed='right',label='right')).agg({
                                        "value1":  "mean", "value2": "max",
                                        "value3": q75})

You can even calculate multiple quantities for the same stat by passing them in a list:

df2 = df1.groupby(pd.Grouper(freq='5Min', closed='right', label='right')).agg({
    "value1": "mean", "value2": "max", "value3": [q25, q50, q75]})

edited Aug 12 '22 at 02:09

answered Aug 11 '22 at 22:13

user5002062

541
3
8

You might also find `groupby.describe` to be useful as well, giving a bunch of summary statistics (including all the quartiles by default) in one go. https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.describe.html – user5002062 Aug 11 '22 at 22:16

How to calculate the quartile statistics of a column using the groupby function?

2 Answers2