1

I have the following dataframe in pandas:

   A     B
   5     0
   5     10
   5     0
   5     0
   5     50

I would like to calculate a column C as the cumulative difference between B and A but with at the condition that C cannot be lower than 0 and higher than 15. How could I obtain the following result with the cumsum function?

   A     B     C
   5     0     0
   5     10    5
   5     0     0
   5     0     0
   5     50    15
RomB
  • 295
  • 3
  • 17

4 Answers4

2

@郑远程 has the correct approach (this answer should be accepted), as the desired operation requires a loop and cannot be vectorized:

Here is a just variant using a function and numba to improve efficiency for those interested:

import numba as nb

@nb.jit(nopython=True)
def cumreset(s):
    out = []
    tot = 0
    for v in s:
        tot = max(0, min(tot+v, 15))
        out.append(tot)
    return out

df['C'] = cumreset(df['B'].sub(df['A']).to_numpy())

Output:

   A   B   C
0  5   0   0
1  5  10   5
2  5   0   0
3  5   0   0
4  5  50  15

Comparison of the python and numba loops:

enter image description here

mozway
  • 194,879
  • 13
  • 39
  • 75
  • could a variation of this question provide a vectorized solution? https://stackoverflow.com/questions/65223420/cumsum-on-pandas-df-with-reset-to-zero-for-negative-cumulative-values/65223451#65223451 – rhug123 Aug 03 '23 at 21:23
1

Oops! My apologies for that oversight. The issue is occurring because the cumulative_diff function is returning a list within the expanding().apply() method, causing a type mismatch.

To fix this, we can change our approach slightly by using a loop to calculate the cumulative sum. Here's the corrected code:

import pandas as pd

df = pd.DataFrame({
    'A': [5, 5, 5, 5, 5],
    'B': [0, 10, 0, 0, 50]
})

# Calculate the difference between B and A
df['diff'] = df['B'] - df['A']

# Initialize a variable to hold the cumulative difference
cum_diff = 0

# Create an empty list to hold the results
results = []

# Loop through the differences, calculate the cumulative sum, and append to the results
for value in df['diff']:
    cum_diff += value
    cum_diff = max(0, cum_diff)  # Reset to 0 if cumulative difference is less than 0
    cum_diff = min(15, cum_diff) # Reset to 15 if cumulative difference is more than 15
    results.append(cum_diff)

# Assign the results to the new 'C' column
df['C'] = results

# Remove the temporary difference column
df.drop('diff', axis=1, inplace=True)

print(df)

This will yield the desired output:

   A   B   C
0  5   0   0
1  5  10   5
2  5   0   0
3  5   0   0
4  5  50  15

I hope this corrected version works for you, and I'm sorry again for the confusion.

郑远程
  • 44
  • 5
  • TY very much for this answer. I think you are very close from what I wanted but I have an error when trying the expanding function in your code : 'TypeError: must be real number, not list' – RomB Aug 03 '23 at 13:44
  • sorry!i have edited again,you run try it – 郑远程 Aug 03 '23 at 13:51
  • You can't use vectorial code with a sum reset, a loop like in this answer is the correct approach – mozway Aug 03 '23 at 15:05
0

I am not sure that your desired output corresponds to your text. I have followed your text above:

df['diff'] = df['B'] - df['A']
df['diff'] = np.where((df['diff']>0) & (df['diff']<15), df['diff'], 0 )
df['C'] = df['diff'].cumsum()
df

   A   B  diff  C
0  5   0     0  0
1  5  10     5  5
2  5   0     0  5
3  5   0     0  5
4  5  50     0  5
gtomer
  • 5,643
  • 1
  • 10
  • 21
0

Try this:

n = df['B'].cumsum().sub(df['A'].cumsum())

df.assign(C = df['A'].groupby(n.lt(0).iloc[::-1].cumsum()).cumsum().rsub(df['B'].cumsum()).clip(0,15).where(n.ge(0),0))

Output:

   A   B   C
0  5   0   0
1  5  10   5
2  5   0   0
3  5   0   0
4  5  50  15
rhug123
  • 7,893
  • 1
  • 9
  • 24
  • 1
    To address your comment to my answer, I really don't think a vectorial method can be used. This one fails as soon as the dataframes is a bit longer. Try: `N = 10 ; np.random.seed(0) ; df = pd.DataFrame({'A': np.random.randint(0, 50, size=N), 'B': np.random.randint(0, 50, size=N)})`. To have a working solution you would need to make recursive/iterative groups, which would be inefficient. – mozway Aug 04 '23 at 04:48