0

I have two dataframes of different sizes, both have many columns and I need to compare two columns that have a different name and if there is a match then add the value of another column to a new column. This is like a VLOOKUP in excel, search the ID of the dataframe 1 in the company_id of dataframe 2 and if there are coincidences insert in the Qty value in the corresponding row of the dataframe 1

df1:

     ID    Area    Dept
0   IDX1    A      Dept 21
1   IDX2    B      Dept 2
2   IDX3    C      Dept 3
3   IDX4    D      Dept 3

df2:

  company_id   Age    Qty
0   ID01       42     10
1   IDX4       40     162
2   ID02       37     17
3   IDX1       42     100
4   ID24       40     12
5   IDX2       37     170
6   ID21       42     10
7   IDX3       40     120
8   ID02       37     17

this is the output that I need:

df3:
     ID    Area    Dept     extracted_qty
0   IDX1    A      Dept 21      100
1   IDX2    B      Dept 2       170
2   IDX3    C      Dept 3       120
3   IDX4    D      Dept 3       162
BadRobot
  • 265
  • 1
  • 9

2 Answers2

0

You could use a merge with a groupby:

df3 = df1.merge(df2.groupby("company_id")["Qty"].sum().rename("extracted_qty"),
                left_on="ID", 
                right_on="company_id",
                how="left")

     ID Area     Dept  extracted_qty
0  IDX1    A  Dept 21            100
1  IDX2    B   Dept 2            170
2  IDX3    C   Dept 3            120
3  IDX4    D   Dept 3            162
not_speshal
  • 22,093
  • 2
  • 15
  • 30
  • thanks for your answer, there is a way to add the column Qty as a new one? for example add the column Qty as "extracted_qty" – BadRobot Dec 15 '21 at 18:50
  • Sure. You can just rename. I edited! I also assumed that if there are multiple rows for a single ID in `df2` you want the sum of the "Qty" column. – not_speshal Dec 15 '21 at 18:56
  • could there be a problem if in dataframe 1 there was also a column called Qyt? That is, would a merge of the Qty column of dataframe 1 and Qty of dataframe 2? I ask because there can be columns with the same name in both dataframes, but I would like to keep the original dataframe 1 only with the extracted qty column from dataframe 2, after doing the match – BadRobot Dec 15 '21 at 18:58
  • Shouldn't be an issue after the renaming. – not_speshal Dec 15 '21 at 18:58
  • thanks I'll try with your suggestion – BadRobot Dec 15 '21 at 19:00
  • If the company_id in `df2` are all unique, you don't need to `groupby`. Just the `merge` is enough. I just assumed `df2` was not all unique and that you might need the sum (like Excel SUMIF) – not_speshal Dec 15 '21 at 19:00
0

Use merge:

mapping = {'company_id': 'ID', 'Qty': 'extracted_qty'}
out = df1.merge(df2.rename(columns=mapping)[['ID', 'extracted_qty']], on='ID')
print(out)

# Output:
     ID Area     Dept  extracted_qty
0  IDX1    A  Dept 21            100
1  IDX2    B   Dept 2            170
2  IDX3    C   Dept 3            120
3  IDX4    D   Dept 3            162
Corralien
  • 109,409
  • 8
  • 28
  • 52