-1

I have the two tables below: One table is Orders and the other table is a Master table. Both are dataframes in Pandas, Python.

Orders
-------------
Date  | item_id
--------------
672022| 123
672022| 976
672022| 532
--------------

Master Table
----------------------------------------------
item_id | Description | Supplier | Ship_From
----------------------------------------------
234     | Oranges     | CWF      | NY
341     | Nuts        | DVR      | NJ
532     | Grapes      | ETT      | CT
123     | Apples      | ERH      | CT
976     | Raspberry   | HKQ      | NY
731     | Bread       | FBE      | NJ
-----------------------------------------------

I want my final table to look like the below:

-------------------------------
Date  | item_id | Description
-------------------------------
672022| 123     | Apples
672022| 976     | Raspberry 
672022| 532     | Grapes
-------------------------------

I am unable to get the desired output in a simple way What is the best way to write joins code in Python? I am using the below code:

new = pd.merge(orders, master, on="item_id", how="left")
new = new[['Date', 'item_id', 'description']]
new.drop_duplicates(subset=None, keep='first', inplace=True)
Sriya TR
  • 29
  • 7
  • I am getting many rows when I do a right or left join because my master table is huge. I need a simple table with item_id and description – Sriya TR Jun 07 '22 at 16:55
  • I get exactly the result you want with exactly your code. I am not sure what the problem is. – Mortz Jun 07 '22 at 17:10

1 Answers1

1

This should work:

import pandas as pd
import io

data1 = '''Date  | item_id
--------------
672022| 123
672022| 976
672022| 532'''

data2 = '''item_id | Description | Supplier | Ship_From
234     | Oranges     | CWF      | NY
341     | Nuts        | DVR      | NJ
532     | Grapes      | ETT      | CT
123     | Apples      | ERH      | CT
976     | Raspberry   | HKQ      | NY
731     | Bread       | FBE      | NJ'''

df1 = pd.read_csv(io.StringIO(data1), sep='\s*\|\s*', engine='python')
df2 = pd.read_csv(io.StringIO(data2), sep='\s*\|\s*', engine='python')
df1.merge(df2[['item_id', 'Description']], on='item_id', how='inner')

output:

Date item_id Description
0 672022 123 Apples
1 672022 976 Raspberry
2 672022 532 Grapes
RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26