0

I have 2 almost identical pandas dataframes with 5 common columns. I want to add the second dataframe to the first which has a new column.

Dataframe 1 Dataframe 1

Dataframe 2 Dataframe 2

But I want it to update the same row given that columns 'Lot name', 'wafer' and 'site' match (green). If the columns do not match, I want to have the value of NaN as shown below.

Desired output Desired output

I have to do this with over 160 discrete columns but with possible matching Lot name, WAFER and SITE values.

I have tried the various merging(left right outer) and concat options, just cant seem to get it right. Any help\comments is appreciated.

Edit, follow up question:

I am trying to use this in a loop, where each iteration generates a new dataframe assigned to TEMP that needs to be merged with the previous dataframe. I cannot merge with an empty dataframe as it gives a merge error. How can I achieve this?

alldata = pd.DataFrame()


for i in range(len(operation)):
    temp = data[data['OPE_NO'].isin([operation[i]])]
    temp = temp[temp['PARAM_NAME'].isin([parameter[i]])]
    temp = temp.reset_index(drop=True)
    temp = temp[["LOT",'Lot name','WAFER',"SITE","PRODUCT",'PARAM_VALUE_NUMBER']]
    temp = temp.rename(columns={'PARAM_VALUE_NUMBER':'PMRM28LEMCKLYTFR.1~'+operation[i]+'~'+parameter[i]})
    alldata.merge(temp,how='outer')
xplodnow
  • 247
  • 4
  • 14
  • I don't understand why `pd.merge(df1, df2, how='outer', on=['Lot name', 'WAFER', 'SITE'])` doesn't work? – Corralien Mar 12 '22 at 09:16

1 Answers1

1

example can be done with the following code

df1.merge(df2, how="outer")

If I'm misunderstanding problem, please tell me problem.

my english is not good but i have good heart to help you

Khai Kim
  • 66
  • 3
  • Hi Khai Kim, thanks for the response. your solution works when starting with 2 dataframes. How about when using it in a loop? I have edited the qn to show the code i have so far. – xplodnow Mar 12 '22 at 14:13
  • bro, merge func need column name to use "on parameter" . don use empty dataframe for merge loop. you can use dataframe named "alldata" by final "temp" when i =0 instead empty dataframe. I'm worried if you can understand my english – Khai Kim Mar 12 '22 at 15:02