Cumulative product for each element over x days in pandas DataFrame

Question

I try to calculate for each element in a Dataframe df1 the cumulative product over x days (for example 3). Is there an efficient way to do that? So far, I only know to generate a normal cumulative product over the rows, without fixing the lookback period to x days (=df2). For example:

2022-01-02 ID1: 1.0528 = (ignoring NaN) 0.94 * 1.12
2022-01-05 ID1: 1.2002 = 0.94 * 1.12 * 1.14
2022-01-09 ID1: 1.4045 = 1.12 * 1.14 * 1.10

Optimally, it would also ignore NaN values and calculate the cumulative return over the remaining numbers. So far I only know how to compute that by creating a copy and shifting it, but I would like to do calculate the cumulative product for many columns over 50 rows, which would not be efficient.

df1:
            ID1     ID2
Date        
2022-01-02  NaN     0.95
2022-01-05  0.94    0.98
2022-01-09  1.12    NaN
2022-01-10  1.14    1.02
2022-01-11  1.10    1.00
2022-01-12  0.92    0.82

df2:
            ID1     ID2
Date        
2022-01-02  1.0528  0.9309
2022-01-05  1.2002  0.9996
2022-01-09  1.4045  1.0200
2022-01-10  1.1537  0.8364
2022-01-11  1.0120  0.8200
2022-01-12  0.9200  0.8200

For reproducability:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({
    'Date':['2022-01-02', '2022-01-05', '2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12'],
    'ID1':[np.nan, 0.94, 1.12, 1.14, 1.1, 0.92], 
    'ID2':[0.95, 0.98, np.nan, 1.02, 1, 0.82]})
df1 = df1.set_index('Date')

Thanks a lot for you suggestion!

So here with "three days" you actually mean three rows? Because the examples are using rows which are more than three days in total if you consider the date — user2246849, May 23 '22 at 12:08
I would approach this using a [rolling window](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html) then maybe a custom aggregation method to compute the product of the elements, maybe using [np.prod](https://numpy.org/doc/stable/reference/generated/numpy.prod.html) — SpaceBurger, May 23 '22 at 12:08

user2246849 · Answer 1 · 2022-05-23T13:18:49.370

4

You can use a custom rolling indexer for forward-looking windows and np.nanprod:

import numpy as np

# Window of index i includes rows [i:i+3).
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)

print(df.rolling(indexer, min_periods=1).apply(np.nanprod))

                 ID1     ID2
Date                        
2022-01-02  1.052800  0.9310
2022-01-05  1.200192  0.9996
2022-01-09  1.404480  1.0200
2022-01-10  1.153680  0.8364
2022-01-11  1.012000  0.8200
2022-01-12  0.920000  0.8200

edited May 23 '22 at 13:18

answered May 23 '22 at 12:17

user2246849

4,217
1
12
16

1

A one-liner would be `df1[::-1].rolling(3, 1).apply(np.prod)[::-1]` – SpaceBurger May 23 '22 at 12:24
@SpaceBurger yes, but with `fillna(1)` (see mozway's answer) – user2246849 May 23 '22 at 12:28
1

@user2246849 no, fillna doesn't seem to be necessary, i get the same results and there is no need of using a lambda function with this ; but maybe i should add it to the new answer now that it has been accepted – SpaceBurger May 23 '22 at 12:41

score 2 · Accepted Answer · answered May 23 '22 at 12:21

2

You can use:

df1.fillna(1)[::-1].rolling(window=3, min_periods=1).agg(lambda x: x.prod())[::-1]

output:

                 ID1     ID2
Date                        
2022-01-02  1.052800  0.9310
2022-01-05  1.200192  0.9996
2022-01-09  1.404480  1.0200
2022-01-10  1.153680  0.8364
2022-01-11  1.012000  0.8200
2022-01-12  0.920000  0.8200

How does it work?

rolling uses the previous rows (or is centered), here we reverse the array to compute an inverse rolling
We fill the NaNs with 1 (we could also use np.nanprod)
We use min_periods=1 to enable computation on less than 3 elements

answered May 23 '22 at 12:21

mozway

194,879
13
39
75

1

@user2246849 I also like the custom indexer, too bad it is not included as parameter (like center) – mozway May 23 '22 at 12:25
Adding this to the accepted anwser : `df1[::-1].rolling(3, 1).apply(np.prod)[::-1]` gives the same result but may be less readable. – SpaceBurger May 23 '22 at 12:46
1

@SpaceBurger correct, actually `np.prod` ignores the NaNs if the input is a Series. – mozway May 23 '22 at 12:48

Cumulative product for each element over x days in pandas DataFrame

2 Answers2

Linked