1

I have a simple function:

def f(returns):
    base = (1 + returns.sum()) / (1 + returns).prod()
    base = pd.Series([base] * len(returns))
    exp = returns.abs() / returns.abs().sum()

return (1 + returns) * base.pow(exp) - 1.0

and a DataFrame:

df = pd.DataFrame([[.1,.2,.3],[.4,.5,.6],[.7,.8,.9]], columns=['A', 'B', 'C'])

I can do this:

df.apply(f)

          A         B         C
0  0.084169  0.159224  0.227440
1  0.321130  0.375803  0.426375
2  0.535960  0.567532  0.599279

However, the transposition:

df.transpose().apply(f)

produces an unexpected result:

    0   1   2
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
A NaN NaN NaN
B NaN NaN NaN
C NaN NaN NaN

Now, I can manually transpose the DataFrame:

df2 = pd.DataFrame([[1., 4., 7.],[2., 5., 8.], [3., 6., 9.]], columns=['A', 'B', 'C'])
df2.apply(f)

          A         B         C
0  0.628713  1.516577  2.002160
1  0.989529  1.543616  1.936151
2  1.160247  1.499530  1.836141

I don't understand why I can't simply transpose and then apply the function to each row of the DataFrame. In fact, I don't know why I can't do this either:

df.apply(f, axis=1)

    0   1   2   A   B   C
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
slaw
  • 6,591
  • 16
  • 56
  • 109
  • because it's trying to align the returned series against the original index, as you've transposed the columns and indices around they no longer align so you get an expanded df filled with nan's – EdChum Jan 14 '16 at 18:49
  • @EdChum: Can you tell me which line is the problem? What about `df.apply(f, axis=1)` (see edited code at the bottom). Is that failing for the same reason? – slaw Jan 14 '16 at 18:52
  • same reason, see related answer of mine that demonstrates how broadcasting and alignment works here: http://stackoverflow.com/questions/29954263/what-does-the-term-broadcasting-mean-in-pandas-documentation/29955358#29955358 – EdChum Jan 14 '16 at 18:55
  • your `base` calculation returns a series with index `[0,1,2]` but `returns` has index `['A','B','C']` so you get this stacked series where nothing aligns, hence the `NaN`s – EdChum Jan 14 '16 at 18:59

1 Answers1

2

As EdChum says, the problem is pandas is trying to align the index of the Series you create inside f with the index of the DataFrame. This coincidentally works in your first example because you don't specify an index in the Series call, so it uses the default 0, 1, 2, which happens to be the same as your original DF. If your original DF has some other index, it will fail right away:

>>> df = pd.DataFrame([[.1,.2,.3],[.4,.5,.6],[.7,.8,.9]], columns=['A', 'B', 'C'], index=[8, 9, 10])
>>> df.apply(f)
     A   B   C
0  NaN NaN NaN
1  NaN NaN NaN
2  NaN NaN NaN
8  NaN NaN NaN
9  NaN NaN NaN
10 NaN NaN NaN

To fix it, explicitly create the new Series with the same index as your DF. Change the line inside d to:

base = pd.Series([base] * len(returns), index=returns.index)

Then:

>>> df.apply(f)
           A         B         C
8   0.084169  0.159224  0.227440
9   0.321130  0.375803  0.426375
10  0.535960  0.567532  0.599279
>>> df.T.apply(f)
          8         9         10
A  0.087243  0.293863  0.453757
B  0.172327  0.359225  0.505245
C  0.255292  0.421544  0.553746
BrenBarn
  • 242,874
  • 37
  • 412
  • 384