Pandas: df.groupby(x, y).apply() across multiple columns parameter error

Question

Basic Problem:

I have several 'past' and 'present' variables that I'd like to perform a simple percent change 'row-wise' on. For example: ((exports_now - exports_past)/exports_past)).

These two questions accomplish this but when I try a similar method I get an error that my function deltas gets an unknown parameter axis.

Data Example :

exports_ past    exports_ now    imports_ past    imports_ now    ect.(6 other pairs)
   .23               .45             .43             .22              1.23
   .13               .21             .47             .32               .23
    0                 0              .41             .42               .93
   .23               .66             .43             .22               .21
    0                .12             .47             .21              1.23

Following the answer in the first question,

My solution is to use a function like this:

def deltas(row):
    '''
    simple pct change
    '''
    if int(row[0]) == 0 and int(row[1]) == 0:
        return 0
    elif int(row[0]) == 0:
        return np.nan
    else:
        return ((row[1] - row[0])/row[0])

And apply the function like this:

df['exports_delta'] = df.groupby(['exports_past', 'exports_now']).apply(deltas, axis=1)

This generates this error : TypeError: deltas() got an unexpected keyword argument 'axis' Any Ideas on how to get around the axis parameter error? Or a more elegant way to calculate the pct change? The kicker with my problem is that I needs be able to apply this function across several different column pairs, so hard coding the column names like the answer in 2nd question is undesirable. Thanks!

Andy Hayden · Answer 1 · 2013-07-31T15:12:18.823

7

Consider using the pct_change Series/DataFrame method to do this.

df.pct_change()

The confusion stems from two different (but equally named) apply functions, one on Series/DataFrame and one on groupby.

In [11]: df
Out[11]:
   0  1  2
0  1  1  1
1  2  2  2

The DataFrame apply method takes an axis argument:

In [12]: df.apply(lambda x: x[0] + x[1], axis=0)
Out[12]:
0    3
1    3
2    3
dtype: int64

In [13]: df.apply(lambda x: x[0] + x[1], axis=1)
Out[13]:
0    2
1    4
dtype: int64

The groupby apply doesn't, and the kwarg is passed to the function:

In [14]: g.apply(lambda x: x[0] + x[1])
Out[14]:
0    2
1    4
dtype: int64

In [15]: g.apply(lambda x: x[0] + x[1], axis=1)
TypeError: <lambda>() got an unexpected keyword argument 'axis'

Note: that groupby does have an axis argument, so you can use it there, if you really want to:

In [16]: g1 = df.groupby(0, axis=1)

In [17]: g1.apply(lambda x: x.iloc[0, 0] + x.iloc[1, 0])
Out[17]:
0
1    3
2    3
dtype: int64

edited Jul 31 '13 at 15:12

answered Jul 31 '13 at 15:07

Andy Hayden

359,921
101
625
535

Thanks for your answer Andy. If I stick with the groupby apply and remove the axis param, I get a key error `KeyError: u'no item named 0'` for accessing the elements as `row[0]` ect. Is there a way to use the groupby apply and still use a notation that keeps it easy to apply to several differently named column pairs? – agconti Jul 31 '13 at 15:12
I thought about the the `df.pct_change()` function, but I believe it only applys to a single column. ie. `df.pct_change(self, periods=1, fill_method='pad', limit=None, freq=None, **kwd)`. I haven't checked the source but I believe it accomplishes it through something similar to the `.shift()` method. If that is true I'm not sure it can be applied to multiple columns. – agconti Jul 31 '13 at 15:17
@agconti updated, you can use the groupby with axis=1, you can apply pct_change to entire dataframe. Or perhaps you want to do this one each group using an apply (`lambda x: x.pct_change()`). – Andy Hayden Jul 31 '13 at 15:20
I think I wasnt 100% clear in my post. (ive updated it). I'm looking to do the pct_change() calculation not by shifting periods within exports_past, and exports now but by using those values. ie. `((exports_now - exports_past)/exports_past)`. – agconti Jul 31 '13 at 15:23
passing the axis=1 to groupby results in a `ValueError: Wrong number of items passed 1, indices imply 0` when used like this:`df['xx_delta'] = df.groupby(['xx_past', 'xx_now'], axis=1).apply(deltas)` – agconti Jul 31 '13 at 15:28
@agconti you need to tweak delta a bit, see my last example (assuming that's from the apply) – Andy Hayden Jul 31 '13 at 15:40
Thanks for your all of your suggestions. Switching the row[i] calls to row.iloc[0,i] yeilds the same error: ValueError: Wrong number of items passed 1, indices imply 0. I get the feeling that the solution is obvious, and I'm making a silly conceptual error here. Can you point me to a reference that might help me get in the right direction? For some reason this puzzle is throwing me for a loop. Thanks! – agconti Jul 31 '13 at 16:17
@agconti .iloc[i, 0] ? :s – Andy Hayden Jul 31 '13 at 16:20
@agconti not sure I have a good reference, but a good trick is to set up a break point in the function you're going to apply and then see how you can access things you want – Andy Hayden Jul 31 '13 at 16:25

Pandas: df.groupby(x, y).apply() across multiple columns parameter error

Basic Problem:

Data Example :

My solution is to use a function like this:

And apply the function like this:

1 Answers1