1

I need to perform a cumulative sum on a data frame that is grouped, but I need to have it reset when the previous value is negative and the current value is positive.

In R I could apply a condition to the groupby with ave() function, but I can't do that in python, so I am having a bit of trouble thinking of a solution. Can anyone help me out?

Here is a sample:

import pandas as pd

df = pd.DataFrame({'PRODUCT': ['A'] * 40, 'GROUP': ['1'] * 40, 'FORECAST': [100, -40, -40, -40]*10, })

df['CS'] = df.groupby(['GROUP', 'PRODUCT']).FORECAST.cumsum()

# Reset cumsum if
# condition: (df.FORECAST > 0) & (df.groupby(['GROUP', 'PRODUCT']).FORECAST.shift(-1).fillna(0) <= 0)
automa7
  • 494
  • 4
  • 15

1 Answers1

1

This solution will work to reset the sum for any example where the values to be summed change from negative to positive (regardless of whether the dataset is nice and periodic as it is in your example)

import numpy as np
import pandas as pd

df = pd.DataFrame({'PRODUCT': ['A'] * 40, 'GROUP': ['1'] * 40, 'FORECAST': [100, -40, -40, -40]*10, })

cumsum = np.cumsum(df['FORECAST'])

# Array of indices where sum should be reset
reset_ind = np.where(df['FORECAST'].diff() > 0)[0]

# Sums that need to be subtracted at resets
subs = cumsum[reset_ind-1].values

# Repeat subtraction values for every entry BETWEEN resets and values after final reset
rep_subs = np.repeat(subs, np.hstack([np.diff(reset_ind), df['FORECAST'].size - reset_ind[-1]]))

# Stack together values before first reset and resetted sums
df['CS'] = np.hstack([cumsum[:reset_ind[0]], cumsum[reset_ind[0]:] - rep_subs])

Alternatively, based on this solution to a similar question (and my realisation of the usefulness of groupby)

import pandas as pd
import numpy as np

df = pd.DataFrame({'PRODUCT': ['A'] * 40, 'GROUP': ['1'] * 40, 'FORECAST': [100, -40, -40, -40]*10, })

# Create indices to group sums together
df['cumsum'] = (df['FORECAST'].diff() > 0).cumsum()

# Perform group-wise cumsum
df['CS'] = df.groupby(['cumsum'])['FORECAST'].cumsum()

# Remove intermediary cumsum column
df = df.drop(['cumsum'], axis=1)
jwalton
  • 5,286
  • 1
  • 18
  • 36
  • Thank you, it seem to have worked once I added the column GROUP and PRODUCT to the groupby, since if I don't add it the cumsum may overlap between products. – automa7 Mar 01 '19 at 13:53