Reductions down a column in Pandas

Question

I'm trying to transform a (well, many) column of return data to a column of closing prices. In Clojure, I'd use reductions, which is like reduce, but returns a sequence of all the intermediate values.

e.g.

$ c

0.12
-.13
0.23
0.17
0.29
-0.11

# something like this
$ c.reductions(init=1, lambda accumulator, ret: accumulator * (1 + ret)) 

1.12
0.97
1.20
1.40
1.81
1.61

NB: The actual closing price doesn't matter, hence using 1 as the initial value. I just need a "mock" closing price.

My data's actual structure is a DataFrame of named columns of TimeSeries. I guess I'm looking for a function similar applymap, but I'd rather not do something hacky with that function and reference the DF from within it (which I suppose is one solution to this problem?)

Additionally, what would I do if I wanted to keep the returns data, but have the closing "price" with it? Should I return a tuple instead, and have the TimeSeries be of the type (returns, closing_price)?

Ah, the function itself isn't the issue (should've written that above); that's easy enough. It's the (idiomatic, proper) application to the columns of the DataFrame that's the problem for me. Thanks, though! — Isaac, Jan 26 '13 at 22:37
Your description of your problem is a little hard to grasp in the abstract. Can you provide sample data in your actual format with an example of what you actually want it to do? — BrenBarn, Jan 26 '13 at 23:03

score 6 · Answer 1 · answered Jan 27 '13 at 00:15

It doesn't look like it's a well publicized feature yet, but you can use expanding_apply to achieve the returns calculation:

In [1]: s
Out[1]:
0    0.12
1   -0.13
2    0.23
3    0.17
4    0.29
5   -0.11

In [2]: pd.expanding_apply(s ,lambda s: reduce(lambda x, y: x * (1+y), s, 1))

Out[2]:
0    1.120000
1    0.974400
2    1.198512
3    1.402259
4    1.808914
5    1.609934

I'm not 100% certain, but I believe expanding_apply works on the applied series starting from the first index through the current index. I use the built-in reduce function that works exactly like your Clojure function.

Docstring for expanding_apply:

Generic expanding function application

Parameters
----------
arg : Series, DataFrame
func : function
    Must produce a single value from an ndarray input
min_periods : int
    Minimum number of observations in window required to have a value
freq : None or string alias / date offset object, default=None
    Frequency to conform to before computing statistic
center : boolean, default False
    Whether the label should correspond with center of window

Returns
-------
y : type of input argument

This is a very interesting function, I'm looking forward to playing with this. Thanks. — Andy Hayden, Jan 27 '13 at 03:48
Thanks! This works, except that I need to figure out how to deal with NaN data here (right now, just using a simple check e.g. `lambda x, y: x * (1+y) if pd.notnull(y) else x`, but that's not a great way of doing it, for obvious reasons… — Isaac, Jan 27 '13 at 04:07

score 4 · Accepted Answer · edited May 23 '17 at 10:30

4

It's worth noting that it's often faster (as well as easier to understand) to write more verbosely in pandas, rather than write as a reduce.

In your specific example I would just add and then cumprod:

In [2]: c.add(1).cumprod()
Out[2]: 
0    1.120000
1    0.974400
2    1.198512
3    1.402259
4    1.808914
5    1.609934

or perhaps init * c.add(1).cumprod().

Note: In some cases however, for example where memory is an issue, you may have to rewrite these in a more low-level/clever way, but it's usually worth trying the simplest method first (and testing against it e.g. using %timeit or profiling memory).

edited May 23 '17 at 10:30

Community

1
1

answered Jan 27 '13 at 03:47

Andy Hayden

359,921
101
625
535

Also worth noting that for this specific example (even with 1000 entries) it is much faster than the `expanding_apply` above. – Andy Hayden Jan 27 '13 at 03:59
Ah, updating pandas to 0.10.1 now; that should fix this. – Isaac Jan 27 '13 at 04:18
You can do `(c+1).cumprod()`. I'm getting the same error +1 for Andy's solution. It's much faster than what I proposed. – Zelazny7 Jan 27 '13 at 04:20
@Zelazny7 When I typed this, I thought your method might be faster (and I can imagine some examples where it might be), but definitely a good idea to try the simplest (most pythonic?) method first :) – Andy Hayden Jan 27 '13 at 06:12

score 0 · Answer 3 · answered Oct 24 '15 at 00:58

For readability, I prefer the following solution:

returns = pd.Series([0.12, -.13, 0.23, 0.17, 0.29, -0.11])

initial_value = 100
cum_growth = initial_value * (1 + returns).cumprod()

>>> cum_growth
0    112.000000
1     97.440000
2    119.851200
3    140.225904
4    180.891416
5    160.993360
dtype: float64

If you'd like to include the initial value in the series:

>>> pd.concat([pd.Series(initial_value), cum_growth]).reset_index(drop=True)
0    100.000000
1    112.000000
2     97.440000
3    119.851200
4    140.225904
5    180.891416
6    160.993360
dtype: float64

Reductions down a column in Pandas

3 Answers3