1

I have a pandas dataframe of the following form:

id   n_items  paid
 1         5     1
 1         2     0
 1         6     0
 1         1     1
 1         3     0
 1         8     0
 2         7     0
 2         7     1
 3         1     1
 3         5     1
 3         2     1

'id' specifies a customer ID, 'n_items' tells how many items have been bought during a particular transaction, and 'paid' tells if the customer has paid for that purchase. For every customer ID the rows are chronological.

For every row, I want to determine how many items have been purchased in total since the customer has last paid. The output should look like this:

id   n_items  paid   cum_days
 1         5     1          5
 1         2     0          2
 1         6     0          8
 1         1     1          9
 1         3     0          3
 1         8     0         11
 2         7     0          7 
 2         7     1         14
 3         1     1          1
 3         5     1          5
 3         2     1          2

I have found this post, which addresses a similar question, but I haven't managed modifying it for this purpose.

DominikS
  • 231
  • 3
  • 12

1 Answers1

4

Use cumsum to calculate a group variable to identify the pattern (since last paid) and then calculate the cumsum of n_items for each id and unpaid period:

df['cumdays'] = df.groupby([
                    df.id, df.paid.cumsum().shift().fillna(0)
                ]).n_items.cumsum()
df

enter image description here

Psidom
  • 209,562
  • 33
  • 339
  • 356