How to edit running total column to restart with every new column value?

Question

I have the data frame pictured below. I need the 'Total #' column to restart every time there is a new value in the 'Item Number' column. For example, if Index 4 was the last occurrence of 104430-003 then 14 would be the last 'Total #' and it would start recounting the 'Total #' of VTHY-039 in the appropriate 'Bin Loc.'.

Once I figure out that part my final step is to drop any of the same remaining 'Item Numbers' after the 'Total #' is equal or greater than the PV Pick #.

Code

score 0 · Answer 1 · edited Apr 02 '21 at 15:20

pv['cumsum'] = pv.groupby('Item Number')['Items'].transform(pd.Series.cumsum)

pv

    Item Number Bin Loc.    PV Pick Items   cumsum
0   104430-003  A-P28-17B   4   2   2
1   104430-003  A-P39-20B   4   4   6
2   104430-003  A-P39-20C   4   1   7
3   104430-003  A-P39-26C   4   2   9
4   104430-003  A-P40-23C   4   5   14
... ... ... ... ... ...
829 VTHY-039    A-P45-09B   1   2   36
830 VTHY-039    A-P45-13B   1   2   38
831 VTHY-039    A-P45-19B   1   2   40
832 VTHY-039    A-P45-21B   1   3   43
833 VTHY-039    A-P46-21B   1   2   45

Blake · Answer 2 · 2021-04-02T21:13:40.617

You can accomplish all of this with .loc. First, loop through all of the unique item numbers and use .loc to transform a subset of columns. Then use .loc to select only rows where Total < PV Pick.

import pandas as pd

df = pd.DataFrame({'Item Number': ['104430-003', '104430-003', '104430-003', 'VTHY-039', 'VTHY-039', 'VTHY-039', 'VTHY-039'],
                  'PV Pick': [4, 4, 4, 1, 1, 1, 1],
                  'Items': [2, 4, 1, 2, 2, 2, 3]})

items = df['Item Number'].unique()
for item in items:
    df.loc[df['Item Number'] == item, 'Total'] = df.loc[df['Item Number'] == item, 'Items'].cumsum()

df = df.loc[df['Total'] < df['PV Pick']]

This is what the output looks like after running the for loop:

And after selecting rows where Total < PV Pick:

EDIT:

To answer your comment, in the loop you can use .loc to find rows where 'Total' >= 'PV Pick', grab just the first row with iloc, and append these values to breakeven_final. Then you can concat df with df_breakeven_final.

breakeven_final = []
for item in items:
    df.loc[df['Item Number'] == item, 'Total'] = df.loc[df['Item Number'] == item, 'Items'].cumsum()
    breakeven = df.loc[df['Item Number'] == item].loc[df['Total'] >= df['PV Pick']].iloc[0]
    breakeven_final.append(breakeven.values)

df_breakeven_final = pd.DataFrame(breakeven_final, columns=df.columns.values)

df = df.loc[df['Total'] < df['PV Pick']]

df_final = pd.concat([df, df_breakeven_final]).reset_index(drop=True)

Output of df_final (note the index has been reset):

This is so close! The loop is awesome and far more efficient than the answer I came up with but what I need is for the final output to show each row for each 'Item Number' UNTIL the 'Total' value is greater than or equal to the 'PV Pick' value. Then I want to delete all subsequent rows for that particular 'Item Number'. So, for instance on your first output, it would show index 0, 1, and 3. — Kate, Apr 02 '21 at 16:28

How to edit running total column to restart with every new column value?

2 Answers2