I'm struggling with quite a specific issue. I have two pandas dataframes of different lengths with different indexes. For each item contained in df1, I want to look into df2 and take a couple of columns (not contained in df1), where the values of one of the df2 columns are equal to those in df1. Example:
import pandas as pd
data_1 = {'TARGET_NAME':['fishinghook', 'doorlock', 'penguin', 'ashtray', 'cat', 'elephant', 'cupcake', 'exercisebench'],
'FOOBAR':['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
'ix':[320, 321, 322, 323, 324, 325, 326, 328]}
data_2 = {'IMAGE_NAME':['cat', 'penguin', 'jewelrybox', 'exercisebench', 'doorlock', 'jar', ],
'VALUES_1':['h', 'h', 'c', 'm', 'h', 'f'],
'VALUES_2':['hm', 'hl', 'cm', 'ml', 'hh', 'fl'],
'ix':[616, 617, 618, 619, 620, 621]}
desired = {'TARGET_NAME':['fishinghook', 'doorlock', 'penguin', 'ashtray', 'cat', 'elephant', 'cupcake', 'exercisebench'],
'FOOBAR':['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
'PRODUCED_VALUES_1':['DROPPED', 'h', 'h', 'DROPPED', 'h', 'DROPPED', 'DROPPED', 'm'],
'ix':[320, 321, 322, 323, 324, 325, 326, 328]}
df1 = pd.DataFrame(data_1, index=data_1['ix'])
df2 = pd.DataFrame(data_2, index=data_2['ix'])
desired_df = pd.DataFrame(desired, index=desired['ix'])
df1
Out[2]:
FOOBAR TARGET_NAME ix
320 foo fishinghook 320
321 bar doorlock 321
322 foo penguin 322
323 bar ashtray 323
324 foo cat 324
325 bar elephant 325
326 foo cupcake 326
328 bar exercisebench 328
df2
Out[3]:
IMAGE_NAME VALUES_1 VALUES_2 ix
616 cat h hm 616
617 penguin h hl 617
618 jewelrybox c cm 618
619 exercisebench m ml 619
620 doorlock h hh 620
621 jar f fl 621
desired_df
Out[4]:
FOOBAR PRODUCED_VALUES_1 TARGET_NAME ix
320 foo DROPPED fishinghook 320
321 bar h doorlock 321
322 foo h penguin 322
323 bar DROPPED ashtray 323
324 foo h cat 324
325 bar DROPPED elephant 325
326 foo DROPPED cupcake 326
328 bar m exercisebench 328
I want to look at each value in df1['TARGET_NAME'] and, where it equals df2['IMAGE_NAME'], take the VALUES_1 & VALUES_2 columns from df2 and add those details to df1 (or a copy of df1). If it doesn't match anywhere in df2 (because positions are all different as well) then I want it to write something else (e.g. 'DROPPED'). Ideally I want the df1 indexes to remain the same.
Any help appreciated!