Accumulate values according to condition in a Pandas data frame

Question

I have a Pandas data frame (to illustrate the expected behavior) as follow:

df = pd.DataFrame({
        'Id': ['001', '001', '002', '002'],
        'Date': ['2013-01-07', '2013-01-14', '2013-01-07', '2013-01-14'],
        'Purchase_Quantity': [12, 13, 10, 6],
        'lead_time': [4, 2, 6, 4],
        'Order_Quantity': [21, 34, 21, 13]
    })
    df['Date'] = pd.to_datetime(df['Date'])
    df = df.groupby(['Id', 'Date']).agg({
        'Purchase_Quantity': sum,
        'lead_time': sum,
        'Order_Quantity': sum})

                Purchase_Quantity  lead_time  Order_Quantity
Id  Date                                                    
001 2013-01-07                 12          4              21
    2013-01-14                 13          2              34
002 2013-01-07                 10          6              21
    2013-01-14                  6          4              13

Where lead_time is a duration in days.

I would like to add a column that keep track of the "quantity on hand" which is:

Remaining quantity from previous weeks
Plus ordered quantity that are finally available
Minus purchased quantity of the current week

The expected result should be:

                Purchase_Quantity  lead_time  Order_Quantity  OH
Id  Date                                                    
001 2013-01-07                 12          4              21   0
    2013-01-14                 13          2              34   9
002 2013-01-07                 10          6              21   0
    2013-01-14                  6          4              13  11

Your error comes from your comparison `df['lead_time`] < 1`. `df['lead_time`]` is a `Series` (a pandas list). Comparing to 1 gives you a series of boolean (some are True, other False), hence you cannot use it as is for your `if` condition. — AlexisBRENON, Nov 15 '19 at 11:20
lead time says the number of weeks until a quantity purchased in week i will be arrived, therefore "on hand" — Murcielago, Nov 15 '19 at 11:24
Can you "manually" compute the expected values of your 'OH' column, allowing us to test our solutions. — AlexisBRENON, Nov 15 '19 at 11:26
Please give us the expected result. How do you handle rows where `i-lead_time` is negative or so. I updated my answer, but still not matching what you want... — AlexisBRENON, Nov 15 '19 at 11:40
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/202402/discussion-between-alexisbrenon-and-lelionjaune). — AlexisBRENON, Nov 15 '19 at 12:20

AlexisBRENON · Answer 1 · 2019-11-18T09:54:16.060

I think you should look after itertools.accumulate to build your new row (instead of iterating your data frame rows).

This is a first attempt. I will update it to better match what you try to achieve in your edit.

diff = df['Order_Quantity'] - df['Purchase_Quantity']
acc = list(itertools.accumulate(diff))
df['on_hand'] = acc
print(df)

Edit

I think I misunderstood what you try to achieve.

Here is your base data frame:

                Purchase_Quantity  lead_time  Order_Quantity
Id  Date                                                    
001 2013-01-07                 12          4              21
    2013-01-14                 13          2              34
002 2013-01-07                 10          6              21
    2013-01-14                  6          4              13

From what I understand you On Hand column must report the number of "Purchased" items which are not arrived yet. Looking to something like this:

                Purchase_Quantity  lead_time  On_Hand
Id  Date                                                    
001 2013-01-07                 12          4       12
    2013-01-14                 13          2       25  # (12 + 13)
002 2013-01-07                 10          6       10
    2013-01-14                  6          4       16  # (10 + 6)

Did I understand well? If so, what is the Order_Quantity for?

Edit 2

Here is an new example, heavily inspired by this post, which seems to match your use case.

I changed column names to avoid confusion (what is the difference between, "Order" and "Purchase" which translate to the same word in my language...). You should also convert your lead time to datetime.timedelta object, making units and computation more clear.

import pandas as pd


def main():
    df = pd.DataFrame({
        'Id': ['001', '001', '002', '002'],
        'Date': ['2013-01-07', '2013-01-14', '2013-01-07', '2013-01-14'],
        'Ordered': [21, 34, 21, 13],
        'LeadTime': [4, 2, 6, 4],
        'Sold': [12, 13, 10, 6],
    })
    df['Date'] = pd.to_datetime(df['Date'])
    df['LeadTime'] = pd.to_timedelta(df['LeadTime'], unit="days")
    print(df)

    df['Received'] = df.apply(lambda x: df.loc[(df['Date']+df['LeadTime'] <= x['Date']) & (df['Id'] == x['Id']), "Ordered"].sum(), axis=1)
    df['Diff'] = df['Received'] - df['Sold']
    print(df)

if __name__ == '__main__':
    main()

As shown here, you probably have to do it in two steps. First build a new column for which the value depend of the current values of the row (see the linked post). Then do others computations that can be vectorized.

This do not provide the expected output still but provide a good starting point I think.

this work well without including lead time, it is much faster than a loop in my actual dataset — Murcielago, Nov 15 '19 at 11:25
yes and no aha the lead corresponds to how many below the purchase quantity will become 'on_hand' which is why i scaled it to be in weeks too. Order_quantity is what is being sold therefore not on hand in the next week — Murcielago, Nov 15 '19 at 11:39

Accumulate values according to condition in a Pandas data frame

1 Answers1

Edit

Edit 2