5

Anyone knows how to pass arguments in a groupby.agg() with multiple functions?

Bottom line, I would like to use it with a custom function, but I will ask my question using a built-in function needing an argument.

Assuming:

import pandas as pd
import numpy as np
import datetime
np.random.seed(15)
day = datetime.date.today()
day_1 = datetime.date.today() - datetime.timedelta(1)
day_2 = datetime.date.today() - datetime.timedelta(2)
day_3 = datetime.date.today() - datetime.timedelta(3)
ticker_date = [('fi', day), ('fi', day_1), ('fi', day_2), ('fi', day_3),
               ('di', day), ('di', day_1), ('di', day_2), ('di', day_3)]
index_df = pd.MultiIndex.from_tuples(ticker_date, names=['lvl_1', 'lvl_2'])
df = pd.DataFrame(np.random.rand(8), index_df, ['value'])

How would I do this:

df.groupby('lvl_1').agg(['min','max','quantile'])

with, as argument for 'quantile':

q = 0.22 
Nicholas Sizer
  • 3,490
  • 3
  • 26
  • 29
marco
  • 129
  • 1
  • 6

2 Answers2

10

Use lambda function:

q = 0.22
df1 = df.groupby('lvl_1')['value'].agg(['min','max',lambda x: x.quantile(q)])
print (df1)
            min       max  <lambda>
lvl_1                              
di     0.275401  0.530000  0.294589
fi     0.054363  0.848818  0.136555

Or is possible create f function and set it name for custom column name:

q = 0.22
f = lambda x: x.quantile(q)
f.__name__ = 'custom_quantile'
df1 = df.groupby('lvl_1')['value'].agg(['min','max',f])
print (df1)
            min       max  custom_quantile
lvl_1                                     
di     0.275401  0.530000         0.294589
fi     0.054363  0.848818         0.136555
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Awesome, second time you help me out! I like the second option because it will help me a lot given that bottom line, I am looking to set custom functions! Thanks mate – marco Feb 17 '18 at 17:35
  • Is there a practical difference between creating a named lambda, or just using a def statement? – RedPanda Jun 19 '19 at 17:11
  • 1
    @RedPanda - in my opinion `def` is more common, but is is same with some [exceptions](https://stackoverflow.com/a/33577016) – jezrael Jun 20 '19 at 05:51
1
df1 = df.groupby('lvl_1')['value'].agg(['min','max',("custom_quantile",lambda x: x.quantile(q))])

for q=0.22, the output is:

       min      max         custom_quantile
lvl_1           
di     0.275401 0.530000    0.294589
fi     0.054363 0.848818    0.136555
Itamar Mushkin
  • 2,803
  • 2
  • 16
  • 32
  • 1
    I doubt that this helps, or even works at all. To convince me otherwise please explain how this works and why it is supposed to help. – Yunnosch May 27 '20 at 08:29
  • 1
    It works (edited output), and is similar to the first option in jerzael's answer, although it's a bit better (naming the column inline). – Itamar Mushkin May 27 '20 at 08:47
  • Providing output which makes the "it works, honest" more plausible is appreciated. However, an explanation of how it works and why it is supposed to help would be better. StackOverflow is about sharing knowledge and helping people to understand. Not for providing code to solve problems (tested or not). Please help to fight the misunderstanding that StackOverflow is a platform for finding unpaid programmers to do work for others. – Yunnosch May 27 '20 at 14:33