3

I try to calculate for each element in a Dataframe df1 the cumulative product over x days (for example 3). Is there an efficient way to do that? So far, I only know to generate a normal cumulative product over the rows, without fixing the lookback period to x days (=df2). For example:

  • 2022-01-02 ID1: 1.0528 = (ignoring NaN) 0.94 * 1.12
  • 2022-01-05 ID1: 1.2002 = 0.94 * 1.12 * 1.14
  • 2022-01-09 ID1: 1.4045 = 1.12 * 1.14 * 1.10

Optimally, it would also ignore NaN values and calculate the cumulative return over the remaining numbers. So far I only know how to compute that by creating a copy and shifting it, but I would like to do calculate the cumulative product for many columns over 50 rows, which would not be efficient.

df1:
            ID1     ID2
Date        
2022-01-02  NaN     0.95
2022-01-05  0.94    0.98
2022-01-09  1.12    NaN
2022-01-10  1.14    1.02
2022-01-11  1.10    1.00
2022-01-12  0.92    0.82

df2:
            ID1     ID2
Date        
2022-01-02  1.0528  0.9309
2022-01-05  1.2002  0.9996
2022-01-09  1.4045  1.0200
2022-01-10  1.1537  0.8364
2022-01-11  1.0120  0.8200
2022-01-12  0.9200  0.8200

For reproducability:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({
    'Date':['2022-01-02', '2022-01-05', '2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12'],
    'ID1':[np.nan, 0.94, 1.12, 1.14, 1.1, 0.92], 
    'ID2':[0.95, 0.98, np.nan, 1.02, 1, 0.82]})
df1 = df1.set_index('Date')

Thanks a lot for you suggestion!

fjurt
  • 783
  • 3
  • 14
  • So here with "three days" you actually mean three rows? Because the examples are using rows which are more than three days in total if you consider the date – user2246849 May 23 '22 at 12:08
  • I would approach this using a [rolling window](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html) then maybe a custom aggregation method to compute the product of the elements, maybe using [np.prod](https://numpy.org/doc/stable/reference/generated/numpy.prod.html) – SpaceBurger May 23 '22 at 12:08

2 Answers2

4

You can use a custom rolling indexer for forward-looking windows and np.nanprod:

import numpy as np

# Window of index i includes rows [i:i+3).
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)

print(df.rolling(indexer, min_periods=1).apply(np.nanprod))
                 ID1     ID2
Date                        
2022-01-02  1.052800  0.9310
2022-01-05  1.200192  0.9996
2022-01-09  1.404480  1.0200
2022-01-10  1.153680  0.8364
2022-01-11  1.012000  0.8200
2022-01-12  0.920000  0.8200
user2246849
  • 4,217
  • 1
  • 12
  • 16
  • 1
    A one-liner would be `df1[::-1].rolling(3, 1).apply(np.prod)[::-1]` – SpaceBurger May 23 '22 at 12:24
  • @SpaceBurger yes, but with `fillna(1)` (see mozway's answer) – user2246849 May 23 '22 at 12:28
  • 1
    @user2246849 no, fillna doesn't seem to be necessary, i get the same results and there is no need of using a lambda function with this ; but maybe i should add it to the new answer now that it has been accepted – SpaceBurger May 23 '22 at 12:41
2

You can use:

df1.fillna(1)[::-1].rolling(window=3, min_periods=1).agg(lambda x: x.prod())[::-1]

output:

                 ID1     ID2
Date                        
2022-01-02  1.052800  0.9310
2022-01-05  1.200192  0.9996
2022-01-09  1.404480  1.0200
2022-01-10  1.153680  0.8364
2022-01-11  1.012000  0.8200
2022-01-12  0.920000  0.8200

How does it work?

  • rolling uses the previous rows (or is centered), here we reverse the array to compute an inverse rolling
  • We fill the NaNs with 1 (we could also use np.nanprod)
  • We use min_periods=1 to enable computation on less than 3 elements
mozway
  • 194,879
  • 13
  • 39
  • 75
  • 1
    @user2246849 I also like the custom indexer, too bad it is not included as parameter (like center) – mozway May 23 '22 at 12:25
  • Adding this to the accepted anwser : `df1[::-1].rolling(3, 1).apply(np.prod)[::-1]` gives the same result but may be less readable. – SpaceBurger May 23 '22 at 12:46
  • 1
    @SpaceBurger correct, actually `np.prod` ignores the NaNs if the input is a Series. – mozway May 23 '22 at 12:48