3

Consider the dataframe df

df = pd.DataFrame(dict(
    A=list('xxxyyy'),
    B=[np.nan, 1, 2, 3, 4, np.nan]
))

df

   A    B
0  x  NaN
1  x  1.0
2  x  2.0
3  y  3.0
4  y  4.0
5  y  NaN

I can use a function within an agg and pass an argument like this

df.groupby('A').B.agg(pd.Series.head, n=1)

A
x    NaN
y    3.0
Name: B, dtype: float64

However, I want to run the aggregation with pd.Series.head and pd.Series.tail. And I want to pass the argument n=1 to both of them.

I want this aggregation to look like the result below. It is important to note that I can produce this result already. My goal here is to figure out how to pass arguments to the multiple functions that are being passed to agg.

If it can't be done, an explanation why would be a valid answer.

     h    t
A          
x  NaN  2.0
y  3.0  NaN

Added Incentive
If you figure this out... it would be a better solution than the one I have for this question. I would encourage whoever answers this one to also answer that one.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Use lambdas? `lambda x: x.head(1)`? – DYZ Aug 17 '17 at 21:31
  • Yes, you can see that I do that as my answer for the other question. But I'm trying to figure out the actual mechanics of passing arguments to multiple functions within an aggregation. – piRSquared Aug 17 '17 at 21:32
  • 1
    Not part of the pandas api, but if you're just passing arguments you may find it cleaner to wrap your functions with `functools.partial`, e.g. `partial(pd.Series.head, n=1)`. Depends on your definition of "clean" though. You'll retain function names with `partial`, whereas `lambda` kills them, i.e. my previous `partial` example passed in a list will give 'head' as the column name. – root Aug 17 '17 at 22:43
  • I like it. Definitely cleaner than a `lambda` when considering the name aspect. – piRSquared Aug 17 '17 at 22:45

2 Answers2

4

If i understand the source code correctly it can't be done:

def aggregate(self, func_or_funcs, *args, **kwargs):
    _level = kwargs.pop('_level', None)
    if isinstance(func_or_funcs, compat.string_types):
        return getattr(self, func_or_funcs)(*args, **kwargs)  # NOTE: (*args, **kwargs) are passed to the function

    if hasattr(func_or_funcs, '__iter__'):
        ret = self._aggregate_multiple_funcs(func_or_funcs,    # NOTE: `*args, **kwargs` got lost ...
                                             (_level or 0) + 1)  
    ...

NOTE: if func_or_funcs has '__iter__' attribute *args and **kwargs will get ignored...

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
2

You may pass lambdas within a dictionary to the agg

>> df.groupby('A').B.agg({'h': lambda s: s.head(1), 't': lambda s: s.tail(1)})

But you may not pass it in the future

FutureWarning: using a dict on a Series for aggregation is deprecated and will be removed in a future version

I prefer to rename lambdas and prevent

SpecificationError: Function names must be unique, found multiple named

>> h = lambda s: s.head(1)
>> h.__name__ = 'h'
>> t = lambda s: s.tail(1)
>> t.__name__ = 't'
>> df.groupby('A').B.agg([h, t])
>>
>>     h     t
>> A        
>> x   NaN   2.0
>> y   3.0   NaN

It may seem that 5 lines is too much, but the lines are quite short!

One possible workaround in passing additional kwargs to the agg functions is to use partial

>> from functools import partial
>> df.groupby('A').B.agg([partial(pd.Series.head, n=1),
>>                        partial(pd.Series.tail, n=1)])
>>
>>     head  tail
>> A        
>> x   NaN   2.0
>> y   3.0   NaN
mr.tarsa
  • 6,386
  • 3
  • 25
  • 42
  • Thank you for the answer. However, as I've said [**here**](https://stackoverflow.com/questions/45745029/can-and-how-do-i-pass-arguments-to-multiple-functions-used-within-an-aggregation/45745429#comment78448010_45745029). I'm trying to use the pandas api to pass arguments to the functions. I'm able to do it with one function. How do I do it with multiple functions? – piRSquared Aug 17 '17 at 21:51