1

I try to create a new column CumReturnin a Dataframe dfwith the cumulative product over the month. I try to reset the cum_prod() to 1 at the end of each month (if EndMonth == 1) and start new with the cumulative product.

df:
Date        EndMonth  ID1  Return
2023-01-30     0       A     0.95
2023-01-30     0       B     0.98
2023-01-31     1       A     1.01
2023-01-31     1       B     1.02
2023-02-01     0       A     1.05
2023-02-01     0       B     0.92
2023-02-02     0       A     0.97
2023-02-02     0       B     0.99

I tried it to do with: df['CumReturn'] = np.where(df['EndMonth'] == 1, 1, df['Return'].groupby('ID1').cumprod())

When I do that, I get for 2023-02-02 the cumulative product over the whole period and not only since the start of February.

For reproducability:

import pandas as pd
df1 = pd.DataFrame({
    'Date':['2023-01-30', '2023-01-30', '2023-01-31', '2023-01-31', '2023-02-01', '2023-02-01', '2023-02-02', '2023-02-02'],
    'EndMonth':[0, 0, 1, 1, 0, 0, 0, 0],
    'ID1':['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
    'Return':[0.95, 0.98, 1.01, 1.02, 1.05, 0.92, 0.97, 0.99]})
df1 = df1.set_index('Date')

Many thanks!

fjurt
  • 783
  • 3
  • 14
  • Can you provide the expected output for clarity? – mozway Jan 23 '23 at 10:57
  • 1
    Does this answer your question? [How to group and count rows by month and year using Pandas?](https://stackoverflow.com/questions/38792122/how-to-group-and-count-rows-by-month-and-year-using-pandas) – Thomas Weller Jan 23 '23 at 11:52

2 Answers2

1

Looks like you want:

# set up grouper per ID and month
g = df1.groupby(['ID1', pd.to_datetime(df1.index).month])['Return']

# get cumprod per month with starting value being 1
# (we divide by the first value)
df1['CumProd'] = g.cumprod().div(g.transform('first'))

Output:

            EndMonth ID1  Return  CumProd
Date                                     
2023-01-30         0   A    0.95     1.00
2023-01-30         0   B    0.98     1.00
2023-01-31         1   A    1.01     1.01
2023-01-31         1   B    1.02     1.02
2023-02-01         0   A    1.05     1.00
2023-02-01         0   B    0.92     1.00
2023-02-02         0   A    0.97     0.97
2023-02-02         0   B    0.99     0.99
mozway
  • 194,879
  • 13
  • 39
  • 75
1

IIUC use:

df1.index = pd.to_datetime(df1.index)

df1['CumReturn'] = (df1['Return'].add(1)
                                 .groupby([df1['ID1'], pd.Grouper(freq='m')])
                                 .cumprod()
                                 .sub(1))
print (df1)
            EndMonth ID1  Return  CumReturn
Date                                       
2023-01-30         0   A    0.95     0.9500
2023-01-30         0   B    0.98     0.9800
2023-01-31         1   A    1.01     2.9195
2023-01-31         1   B    1.02     2.9996
2023-02-01         0   A    1.05     1.0500
2023-02-01         0   B    0.92     0.9200
2023-02-02         0   A    0.97     3.0385
2023-02-02         0   B    0.99     2.8208
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252