I have two excel sheets that I've made into pandas dataframes. The first sheet contains information about assemblies and their part numbers and whether they are bought or made. (Note: the dataframe below represents a filtered version of the original excel file that only contains the Assembly "2" and the required parts to manufacture it):
The second contains overlapping information regarding part numbers (Item Number = Part Number), but also includes others like vendor name, and price (Note again: this is a very simplified example of the sheet which contains thousands of rows and has more columns):
What I'm trying to do is loop through every Part number in df1, go through df2 and pull the matching Vendor name and put in a new dataframe df3 - and if the part number isn't listed in df2 at all, then return it as a null value. I've tried with no success using .merge()
and am wondering if the solution might be some for loop that is beyond my knowledge of python.
some additional info:
- As shown in the example, in the second sheet there is often multiple rows for a single "Item Number". This is because sometimes one part could be bought from different vendors (for instance there could be another row not included in the example for part '4' that is bought from a vendor 'D') - ideally df3 would illustrate this info
- This is my first post so apologies for the poor formatting and lack of knowledge about providing code and examples.
import pandas as pd
import numpy as np
d = {'Assembly': [1, 2, 2, 2], 'Part number': [2, 3, 4, 5], 'Type': [Buy, Buy, Buy, Make]}
df1 = pd.DataFrame(data=d)
data = {'Item Number': [2, 2, 4, 5], 'Vendor': [A, A, B, C], 'Price': [10, 10, 5, 40]}
df2 = pd.DataFrame(data=data)
data1 = {'Part Number': [2, 3, 4, 5], 'Vendor': [A, N/A, B, C]}
df3 = pd.DataFrame(data=data1)