0

I've got success/failure data on several simulations. Each simulation consists of several trials and I want a cumulative sum of the successes per simulation. Here's an example of my data:

data = pd.DataFrame([[0, 0, 0],
                     [0, 1, 0],
                     [0, 2, 1],
                     [0, 3, 0],
                     [1, 0, 1],
                     [1, 1, 0],
                     [1, 2, 0],
                     [1, 3, 1],
                     [2, 0, 0],
                     [2, 1, 1],
                     [2, 2, 1],
                     [2, 3, 1],
                     [0, 0, 0],
                     [0, 1, 1],
                     [0, 2, 1],
                     [0, 3, 0]],
                   columns=['simulation', 'trial', 'success'])

Using this answer, I came up with the following code but it isn't quite working and I can't figure out why.

cumsum = data['success'].cumsum()
reset = -cumsum[data['trial'] == 0].diff().fillna(cumsum)
data['cumsum'] = data['success'].where(data['trial'] != 0, reset).cumsum()

The resulting column is [0, 0, 1, 1, -1, -1, -1, 0, -1, 0, 1, 2, -1, 0, 1, 1] but I expect [0, 0, 1, 1, 1, 1, 1, 2, 0, 1, 2, 3, 0, 1, 2, 2]

Greg
  • 324
  • 1
  • 4
  • 12

1 Answers1

2

You can do groupby 'simulation' & then cumsum the 'success'.

data.groupby(data.simulation.ne(data.simulation.shift()).cumsum())['success'].cumsum()

or

data.groupby((data.simulation!=data.simulation.shift()).cumsum())['success'].cumsum()
moys
  • 7,747
  • 2
  • 11
  • 42
  • The simulation number will repeat itself. Question updated to clarify. – Greg Jan 06 '20 at 02:26
  • That does the trick I think. Thank you. Would you mind explaining how that works? – Greg Jan 06 '20 at 02:42
  • With `data.simulation.ne(data.simulation.shift()).cumsum()` & `(data.simulation!=data.simulation.shift()).cumsum()` we are getting a series which corresponds to a number for each continuous value in column `simulation` (just run this & you can see the output). We then use this series to group the dataframe & `cumsum` the column `success`. If the answer helped you consider up-voting & accepting it as the answer. – moys Jan 06 '20 at 02:50