3

I am trying to sum values in a column and reset at a certain month every year. I have checked the following links which have been helpful, but I still can't seem to find the answer that points me in the right direction.

Cumulative sum at intervals Reset Cumulative sum base on condition Pandas Conditional count of cumulative sum Dataframe - Loop through columns Pandas: conditional rolling count

The link that has been the closest to what I am looking for (Pyspark : Cumulative Sum with reset condition), but I can't figure out how to convert it from PySpark to Pandas (or another Python method.

raw_data = {'change_value': [-6, -13, -19, -82, -25, -39, -27, 0, 8, 32, 55, 94, 75, 77], 
        'cumu_value': [-6, -19, -38, -120, -145, -184, -211, -211, -203, -171, -116, -22, 75, 130], 
        'month': [10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
        'date': ['2017-10','2017-11','2017-12','2018-01','2018-02','2018-03'
                 ,'2018-04','2018-05','2018-06','2018-07','2018-08','2018-09',
                 '2018-10', '2018-11']}

df = pd.DataFrame(raw_data, columns = ['change_value', 'cumu_value', 'month', 'date'])

df

df.loc[df['month'] == '10', ['cumu_value']] = df['change_value']

df['cumu_value'] = df.change_value.cumsum() 

change_value  cumu_value  month     date
0             -6     -6     10  2017-10
1            -13    -19     11  2017-11
2            -19    -38     12  2017-12
3            -82   -120      1  2018-01
4            -25   -145      2  2018-02
5            -39   -184      3  2018-03
6            -27   -211      4  2018-04
7              0   -211      5  2018-05
8              8   -203      6  2018-06
9             32   -171      7  2018-07
10            55   -116      8  2018-08
11            94    -22      9  2018-09
12            75     75     10  2018-10  <<<< every October I would like the to cumu_value to reset - to that month's change_value
13            77    130     11  2018-11 <<< for some reason the cumu_value adds all the values for all the months rather than just the value for 2018-10 and 2018-11
user10756193
  • 183
  • 1
  • 1
  • 8

1 Answers1

2

Create groups where the group_id changes every October. Then cumsum within each group, effectively resetting it every October.

df['cumu_value'] = df.groupby(df.month.eq(10).cumsum()).change_value.cumsum()

Output:

    change_value  cumu_value  month     date
0             -6          -6     10  2017-10
1            -13         -19     11  2017-11
2            -19         -38     12  2017-12
3            -82        -120      1  2018-01
4            -25        -145      2  2018-02
5            -39        -184      3  2018-03
6            -27        -211      4  2018-04
7              0        -211      5  2018-05
8              8        -203      6  2018-06
9             32        -171      7  2018-07
10            55        -116      8  2018-08
11            94         -22      9  2018-09
12            75          75     10  2018-10
13            77         152     11  2018-11

As an illustration, we are grouping rows together as follows:

print(df.month.eq(10).cumsum())
0     1
1     1
2     1
3     1
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    1
12    2
13    2
Name: month, dtype: int32

so we cumsum the first 12 rows separately from the last 2 rows.

ALollz
  • 57,915
  • 7
  • 66
  • 89