5

I need a rolling_product function, or an expanding_product function.

There are various pandas rolling_XXXX and expanding_XXXX functions, but I was surprised to discover the absence of an expanding_product() function.

To get things working I've been using this rather slow alternative

pd.expanding_apply(temp_col, lambda x : x.prod())

My arrays often have 32,000 elements so this is proving to be a bit of a bottleneck. I was tempted to try log(), cumsum(), and exp(), but I thought I should ask on here since there might be a much better solution.

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
JasonEdinburgh
  • 669
  • 1
  • 10
  • 17
  • "There are various numpy rolling_XXXX" - are you sure you mean ``numpy`` and not ``pandas``? – Ami Tavory May 21 '15 at 21:38
  • 2
    For the expanding product, there's `cumprod()`. For the rolling version, I think you'll have to use `rolling_apply` to apply `prod()` to each window. – Alex Riley May 21 '15 at 21:39
  • @JasonEdinburgh "log(), cumsum() and exp()" - Do you mean log, rolling_mean, and exp? – Ami Tavory May 21 '15 at 21:42
  • @AmiTavory yes, you are correct, they're pandas functions, not numpy and you are correct I meant to say rolling_mean – JasonEdinburgh May 21 '15 at 21:48
  • @ajcr This is what I have at the moment pd..expanding_apply(temp_col, lambda x : x.prod()) but as I said, its very slow with many elements. – JasonEdinburgh May 21 '15 at 21:49
  • @JasonEdinburgh I actually think you meant ``rolling_sum``, unless you meant the geometric mean for the products. – Ami Tavory May 21 '15 at 21:54
  • @AmiTavory Yes!, sorry I'm quite tired, I meant rolling_sum. Thank you – JasonEdinburgh May 21 '15 at 22:00
  • 1
    Speaking of tired, [this page](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) might help you figure out the numerical stability of your log, rolling-sum, exp scheme, but I'm too tired to go through it. Good luck. – Ami Tavory May 21 '15 at 22:07
  • @AmiTavory You are quite right. I don't think expanding_product would need to perform repeated divisions, but rolling_product certainly would, and this is probably why it was omitted. I just tried np.exp(pd.expanding_sum(np.log(temp_col))) and it is fast enough for my needs at the moment and seems to give results within 0.00001 of the rolling_apply version. If I see it showing up on a profile then I'll try a numba/cython version next. Thanks for you help :) – JasonEdinburgh May 21 '15 at 22:43

2 Answers2

6

I have a faster mechanism, though you'll need to run some tests to see if the accuracy is sufficient.

Here's the original exp/sum/log version:

def rolling_prod1(xs, n):
    return np.exp(pd.rolling_sum(np.log(xs), n))

And here's a version that takes the cumulative product, shifts it over (pre-filling with nans), and then divides it back out.

def rolling_prod2(xs, n):
    cxs = np.cumprod(xs)
    nans = np.empty(n)
    nans[:] = np.nan
    nans[n-1] = 1.
    a = np.concatenate((nans, cxs[:len(cxs)-n]))
    return cxs / a

Both functions return the same result for this example:

In [9]: xs
Out[9]: array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [10]: rolling_prod1(xs, 3)
Out[10]: array([  nan,   nan,    6.,   24.,   60.,  120.,  210.,  336.,  504.])

In [11]: rolling_prod2(xs, 3)
Out[11]: array([  nan,   nan,    6.,   24.,   60.,  120.,  210.,  336.,  504.])

But the second version is much faster:

In [12]: temp_col = np.random.rand(30000)

In [13]: %timeit rolling_prod1(temp_col, 3)
1000 loops, best of 3: 694 µs per loop

In [14]: %timeit rolling_prod2(temp_col, 3)
10000 loops, best of 3: 162 µs per loop
chrisaycock
  • 36,470
  • 14
  • 88
  • 125
  • 1
    I was confused when I first read this because I thought "but there isn't a numpy cumprod function". Clearly I was very tired last night. I see that there actually IS a numpy cumprod function and somehow I just failed to find it when I googled for it last night! And since I only need an expanding_prod function, np.cumprod is what I was looking for. But I really like the window and single division approach you've taken for doing the rolling version. So despite my embarrassment for failing to find numpy.cumprod, I'll leave this post here in case your solution is useful to someone else. Thank you! – JasonEdinburgh May 22 '15 at 10:43
2

Early results show that this is a fast-ish approximation for expanding_product

np.exp(pd.expanding_sum(np.log(temp_col)))

rolling_product would require repeated divisions which could lead to numerical instabilities (as pointed out by @AmiTavory in a now-deleted answer)

JasonEdinburgh
  • 669
  • 1
  • 10
  • 17