0

I have the data frame pictured below. I need the 'Total #' column to restart every time there is a new value in the 'Item Number' column. For example, if Index 4 was the last occurrence of 104430-003 then 14 would be the last 'Total #' and it would start recounting the 'Total #' of VTHY-039 in the appropriate 'Bin Loc.'.

Once I figure out that part my final step is to drop any of the same remaining 'Item Numbers' after the 'Total #' is equal or greater than the PV Pick #.

Code

Kate
  • 1
  • 1

2 Answers2

0
pv['cumsum'] = pv.groupby('Item Number')['Items'].transform(pd.Series.cumsum)

pv

    Item Number Bin Loc.    PV Pick Items   cumsum
0   104430-003  A-P28-17B   4   2   2
1   104430-003  A-P39-20B   4   4   6
2   104430-003  A-P39-20C   4   1   7
3   104430-003  A-P39-26C   4   2   9
4   104430-003  A-P40-23C   4   5   14
... ... ... ... ... ...
829 VTHY-039    A-P45-09B   1   2   36
830 VTHY-039    A-P45-13B   1   2   38
831 VTHY-039    A-P45-19B   1   2   40
832 VTHY-039    A-P45-21B   1   3   43
833 VTHY-039    A-P46-21B   1   2   45
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
Kate
  • 1
  • 1
0

You can accomplish all of this with .loc. First, loop through all of the unique item numbers and use .loc to transform a subset of columns. Then use .loc to select only rows where Total < PV Pick.

import pandas as pd

df = pd.DataFrame({'Item Number': ['104430-003', '104430-003', '104430-003', 'VTHY-039', 'VTHY-039', 'VTHY-039', 'VTHY-039'],
                  'PV Pick': [4, 4, 4, 1, 1, 1, 1],
                  'Items': [2, 4, 1, 2, 2, 2, 3]})

items = df['Item Number'].unique()
for item in items:
    df.loc[df['Item Number'] == item, 'Total'] = df.loc[df['Item Number'] == item, 'Items'].cumsum()

df = df.loc[df['Total'] < df['PV Pick']]

This is what the output looks like after running the for loop: display(df) output

And after selecting rows where Total < PV Pick: enter image description here

EDIT:

To answer your comment, in the loop you can use .loc to find rows where 'Total' >= 'PV Pick', grab just the first row with iloc, and append these values to breakeven_final. Then you can concat df with df_breakeven_final.

breakeven_final = []
for item in items:
    df.loc[df['Item Number'] == item, 'Total'] = df.loc[df['Item Number'] == item, 'Items'].cumsum()
    breakeven = df.loc[df['Item Number'] == item].loc[df['Total'] >= df['PV Pick']].iloc[0]
    breakeven_final.append(breakeven.values)

df_breakeven_final = pd.DataFrame(breakeven_final, columns=df.columns.values)

df = df.loc[df['Total'] < df['PV Pick']]

df_final = pd.concat([df, df_breakeven_final]).reset_index(drop=True)

Output of df_final (note the index has been reset): enter image description here

Blake
  • 103
  • 1
  • 5
  • This is so close! The loop is awesome and far more efficient than the answer I came up with but what I need is for the final output to show each row for each 'Item Number' UNTIL the 'Total' value is greater than or equal to the 'PV Pick' value. Then I want to delete all subsequent rows for that particular 'Item Number'. So, for instance on your first output, it would show index 0, 1, and 3. – Kate Apr 02 '21 at 16:28
  • I've edited my post to address your comment. – Blake Apr 02 '21 at 21:14