0

I have data in 1 min intervals, and I want to change the granularity to 5 mins, and calculate the basic data statistics using .groupby as such:

   df2 = df1.groupby(pd.Grouper(freq='5Min',closed='right',label='right')).agg({
                                        "value1":  "mean", "value2": "max",
                                        "value3": "quantile"})

I want to get quartile/quantile data as well, but can't assign specific quantile point. The default is 50th quantile. How do I get the 75th quantile for value3?

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
prof31
  • 75
  • 1
  • 7

2 Answers2

1

You can use groupby.quantile function. You will be able to specify the exact quantile and even choose a type of interpolation. I'm not sure that it is possible to perform everything in one step. May be you may need to do it separately and then append a column with quartiles to a df.

Link to the docs: https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.quantile.html

George
  • 21
  • 6
  • yeah, i was confused because my 5 data points for quantile calculation will be those 1 min interval values. data collected every minute ==> I want to group the data by every 5 mins==> add the quantiles to a different columns – prof31 Aug 11 '22 at 20:05
  • there will be a quantile value for every 5 min interval, but im confused how to implement it. – prof31 Aug 11 '22 at 20:07
  • May be you need to group by 5 rows, and then perform other operations. Possible answer: https://stackoverflow.com/questions/46478518/groupby-dataframe-by-n-columns-or-n-rows – George Aug 11 '22 at 20:20
1

The values you pass to agg don't have to be strings: they can be other functions. You could define a custom function like

def q75(series):
    return series.quantile(0.75)

and then pass this to agg like

   df2 = df1.groupby(pd.Grouper(freq='5Min',closed='right',label='right')).agg({
                                        "value1":  "mean", "value2": "max",
                                        "value3": q75})

You can even calculate multiple quantities for the same stat by passing them in a list:

df2 = df1.groupby(pd.Grouper(freq='5Min', closed='right', label='right')).agg({
    "value1": "mean", "value2": "max", "value3": [q25, q50, q75]})
user5002062
  • 541
  • 3
  • 8
  • You might also find `groupby.describe` to be useful as well, giving a bunch of summary statistics (including all the quartiles by default) in one go. https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.describe.html – user5002062 Aug 11 '22 at 22:16