Assign the value to table 2 by lookup table 1 with non-unique data in python

Question

I am new learner for python. I have two dataframe loaded from xlsx file in python, and tried to get the value to table 2 from table 1

Table 1:

Product ID	Inventory Receipt Date	Age Days	Quantity
AA12345678	Jan 21, 2022	120	400
AA12345678	Jan 30, 2022	111	100
AA12345678	Jan 31, 2022	110	20
BB12345678	Jan 21, 2022	120	120
BB12345678	Feb 1, 2022	109	100

Table 2:

Location Code	Product ID	Required Quantity
ABCD001	AA12345678	100
ABCD001	AA12345678	401
ABCD002	AA12345678	19
EFGH001	BB12345678	200
EFGH002	BB12345678	20

Expected Result:

Location Code	Product ID	Required Quantity	Age days 1	Age days 2	Age days 3
ABCD001	AA12345678	100	120
ABCD001	AA12345678	401	120	111	110
ABCD002	AA12345678	19	110
EFGH001	BB12345678	200	120	109
EFGH002	BB12345678	20	109

The rule of product distribution is first come first served. For example, 'Location Code' ABCD001 require 100 qty. ('Product ID'= 'AA12345678')on row 2 in table 2. It will distribute 100 qty. ('Product ID'='AA12345678') to row 1 and get the 'Age days' to table 2. When the 'Quantity' on row 2 in table 1 is empty, it will lookup row 3 (with same 'Product ID'=AA12345678). The total number of 'Quantity' in table 1 is same as table 2.

I tried to use df2.merge(df1[['Product ID', 'Age Days']], 'left'), but 'Age Days' cannot merge to df2. And tried to use map function (df2['Age Days'] = df2['Product ID'].map(df1.set_index('Product ID')['Age Days'])), but it occur error "uniquely value".

Issue: 'Product ID' is non-unique for lookup/map/merge. How could get all results or its index by lookup/map/merge/other method? In that case, I need to set a flag called "is_empty" for checking, if "is_empty" == yes: I need to get the value from next matched row

I know that case is complex, could you let me know the way to solve it? I am confusing what keywords should I use to study for it. Thank you.

Can you explain into more details how to fill up `Age days 1`, `Age days 2` and `Age days 3` columns in the Expected result table ? Consider Product ID `AA12345678` in Expected table : I don't understand why `Age days 1 = 120` when `Required Quantity=100` but `Age days 1 = 110` when `Required Quantity=19`. — Bunny, Jun 28 '22 at 07:57
Sorry, I make a mistake for the data set: I reverse the 'Quantity' on row 2 in table 1. 300 to 400. Let's use row 3 be a example: Product ID (AA12345678) require 401 quantity in table 2. Use Product ID (AA12345678) to lookup table 1, we find out 3 records are match (product ID(AA12345678), the quantities are 400 for row 2, 100 for row 3, 20 for row 4 in table 1. For the row 2 in table 2, it required qty. is 100. Thus, the quantity (on row 2 in table 1) was reduced by 100 (Table 1 qty: 400, Table 2 required qty: 100, The remaining of Table 1 qty = 400 - 100 = 300 — Oscar_Wong, Jun 28 '22 at 09:00
Return to example row 3, it require 401 qty. Table one row 2 only remain 300, it cannot fulfil the required qty. Therefore, it could fulfil 300 qty. It mean 300 qty locate to ABCD001, reaming = 0, is_empty == 'yes' and open a new column to store related 'Age days' = 120 days. Then, we need to fulfill another 101 required qty. lookup for table 1 again, check the flag is_empty == 'no', and match 'Product ID' == AA12345678. It will find out the row 3 in table 1, the Quantity is 100, and it could fulfill required qty in table 2. — Oscar_Wong, Jun 28 '22 at 09:07
The remaining is equal to 0, and is_empty == 'yes' on row 3 in table 1 and open new column 'Age days 2' to store the 'Age days' from table 1. Finally, the required qty is equal 1, we need to lookup again for table 1 and check the flag is_empty == 'no', and match 'Product ID' == AA12345678. Checked the qty in table 1, and the qty is 20. It could fulfil required qty (1). open a new column to store related 'Age days 3' = 110 days. — Oscar_Wong, Jun 28 '22 at 09:09
1. Table 2 row 2, Product ID: AA12345678 , Required Quantity: 100 2. Check table 1 by Product ID: AA12345678, and the first result is |Product ID|Inventory Receipt Date|Age Days|Quantity| |---|---|---|---| |AA12345678|Jan 21, 2022|120|400| 3. Open new column 'Age Days 1' in table 2 and store Age Days: '120', Quantity on row 2 in table 1 reduce 100, remain = 300 — Oscar_Wong, Jun 28 '22 at 09:18
For next row (table 2 row 3), Product ID: AA12345678 , Required Quantity: 401 4. check table 1 by Product ID: AA12345678, and the first result is |Product ID|Inventory Receipt Date|Age Days|Quantity| |---|---|---|---| |AA12345678|Jan 21, 2022|120|300| 5. Store Age Days: '120' in column 'Age Days 1' in table 2, Quantity on row 2 in table 1 reduce 300, remain = 0. But it cannot fulfil required qty, it need 101 qty 6. Check table 1 by Product ID: AA12345678, Required Quantity: 101 |Product ID|Inventory Receipt Date|Age Days|Quantity| |---|---|---|---| |AA12345678|Jan 30, 2022|111|100| — Oscar_Wong, Jun 28 '22 at 09:25
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Jun 28 '22 at 09:39

score 0 · Accepted Answer · answered Jun 28 '22 at 10:46

After converting your comments into program logic, I ended up with the following :

table 1 file

  |Product ID|Inventory Receipt Date|Age Days|Quantity|
  |:---------|:---------------------|:-------|:-------|
  |AA12345678|Jan 21, 2022|120|400|
  |AA12345678|Jan 30, 2022|111|100| 
  |AA12345678|Jan 31, 2022|110|20| 
  |BB12345678|Jan 21, 2022|120|120| 
  |BB12345678|Feb 1, 2022|109|100|

table 2 file

 |Location Code|Product ID|Required Quantity|
  |-------------|----------|-----------------|
  |ABCD001|AA12345678|100| 
  |ABCD001|AA12345678|401| 
  |ABCD002|AA12345678|19| 
  |EFGH001|BB12345678|200| 
  |EFGH002|BB12345678|20|

code

import pandas as pd
import numpy as np

# convert first table from markdown to dataframe
df1 = pd.read_csv('table1', sep="|")
df1.drop(df1.columns[len(df1.columns)-1], axis=1, inplace=True)
df1.drop(df1.columns[0], axis=1, inplace=True)
df1 = df1.iloc[1:]
df1 = df1.reset_index()

# convert second table from markdown to dataframe
df2 = pd.read_csv('table2', sep="|")
df2.drop(df2.columns[len(df2.columns)-1], axis=1, inplace=True)
df2.drop(df2.columns[0], axis=1, inplace=True)
df2 = df2.iloc[1:]
df2 = df2.reset_index()

# Convert data type of quantities to integer
df1["Quantity"] = pd.to_numeric(df1["Quantity"])
df2["Required Quantity"] = pd.to_numeric(df2["Required Quantity"])

# create an extra column in df2
df2["Age 1"] = np.nan
df2_extra_col = 1

for i2, row2 in df2.iterrows():
    age_col = 1  # current age column
    product_id = row2["Product ID"]
    required_quantity = row2["Required Quantity"]

    # search for this product id in df1
    for i1, row1 in df1.iterrows():
        available_quantity = row1["Quantity"]

        if row1["Product ID"] == product_id:
            if available_quantity == 0:  # skip 0 quantity rows
                continue
            if available_quantity < required_quantity:  # insufficient quantity
                required_quantity -= available_quantity
                df1.loc[i1, "Quantity"] = 0  # take everything
                df2.loc[i2, "Age "+str(age_col)] = row1["Age Days"]

                # add another column to df2 if missing
                age_col += 1
                if(age_col > df2_extra_col):
                    df2["Age "+str(age_col)] = np.nan
                    df2_extra_col += 1

                continue

            else:  # single delivery enough
                df2.loc[i2, "Age "+str(age_col)] = row1["Age Days"]
                df1.loc[i1, "Quantity"] -= required_quantity
                break

df2.drop(df2.columns[0], axis=1, inplace=True)
print(df2)

Result

  Location Code  Product ID  Required Quantity Age 1 Age 2 Age 3
0       ABCD001  AA12345678                100   120   NaN   NaN
1       ABCD001  AA12345678                401   120   111   110
2       ABCD002  AA12345678                 19   110   NaN   NaN
3       EFGH001  BB12345678                200   120   109   NaN
4       EFGH002  BB12345678                 20   109   NaN   NaN

Notes

I will highly recommend you to use single stepping and breakpoints to debug my code to understand what's happening on each line. Let me know if there's anything unclear.
My file contents were in markdown so there was some pre-processing to do to convert it to dataframe. If your files are already in csv, you can convert your files directly to a dataframe using df.read_csv.

Thank you very much. It is perfect!! My files are already in xlsx. For my study, I would like to ask the concept for break and continue: else: # single delivery enough df2.loc[i2, "Age "+str(age_col)] = row1["Age Days"] df1.loc[i1, "Quantity"] -= required_quantity break Is above 'break' break to first for loop? I am confusing for the case if the available quantity is enough for delivery and it remain some qty. The break will jump into current row or next row in that case. or I misunderstand the function of 'iterrows()'. — Oscar_Wong, Jun 29 '22 at 05:47
For continue: for i2, row2 in df2.iterrows(): age_col = 1 # current age column product_id = row2["Product ID"] required_quantity = row2["Required Quantity"] # search for this product id in df1 for i1, row1 in df1.iterrows(): available_quantity = row1["Quantity"] if row1["Product ID"] == product_id: if available_quantity == 0: # skip 0 quantity rows continue when execute 'continue', it will jump to second for loop (for i1, row1 in df1.iterrows()). and it will get value from next row. — Oscar_Wong, Jun 29 '22 at 05:51
My `break` statement will cause the program to exit the whole loop involving `df1.iterrows()`. So program will now move to next row in `df2`. My `continue` statement will cause the program to skip the current row in `df1` and move to the next row in `df1`. For more details on `break` and `continue` statements read here ; https://www.digitalocean.com/community/tutorials/how-to-use-break-continue-and-pass-statements-when-working-with-loops-in-python-3 — Bunny, Jun 29 '22 at 06:15
For the case when the `available quantity >= required quantity`, we can immediately exit the inner loop iterating through `df1` since we have already obtained the necessary amount. We only keep searching `df1` when `available quantity < required quantity`. — Bunny, Jun 29 '22 at 06:17
The `.iterrows()` is simply used to iterate through each row of a dataframe. More details here: https://stackoverflow.com/a/16476974/17627866 — Bunny, Jun 29 '22 at 06:19
Thanks. My concept is clear after your explanation. The 'break' will break the innermost loop if there are nested loop. I will study the link for more details. One more question: df2.drop(df2.columns[0], axis=1, inplace=True), is it use to delete first columns? I find axis [0] mean series, axis [1] mean columns. what is the purpose for them? Besides, what will appear if inplace = False? Sorry for lot of question — Oscar_Wong, Jun 29 '22 at 06:41
Because of the fact that my file contains markdown, doing`df = pd.read_csv('mytable', sep="|")` introduced some additional rows and columns containing characters such as `nan` and `:------`. These rows and columns are not actually part of the data so I must get rid of them. For your case, since your files are already in .xlsx, you won't have to do any such things. `df1.drop(df1.columns[len(df1.columns)-1], axis=1, inplace=True)` and `df1.drop(df1.columns[0], axis=1, inplace=True)` are used to get rid of the first and last columns of df1. — Bunny, Jun 29 '22 at 06:54
After dropping the unnecessary columns in df1, you need to save the result back to df1. That's what `inplace=True` does. If you don't include this, `df1` will not be modified. Another way to do this without `inplace` is `df1 = df1.drop(df1.columns[0], axis=1)` — Bunny, Jun 29 '22 at 06:58
Thank you. I have learned a lot. I just tried to run the program with larger data (around 4000 rows data). It cannot show the result. I used smaller data (around 40 rows), it work perfectly. — Oscar_Wong, Jun 29 '22 at 07:31
`print(df)` will only print a small portion of the `df` if the dataframe is very large. There are other ways to view a dataframe containing thousands of rows : https://www.geeksforgeeks.org/how-to-print-an-entire-pandas-dataframe-in-python/ — Bunny, Jun 29 '22 at 08:27