I have a data frame like as shown below
df = pd.DataFrame({'source_code':['A250.00','C791.0','716.90','493.90','143.21','134.52'],
'source_description':['test1', 'test1','test2','test3','test4,'test5'],
'key_id':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})
hash_file = pd.DataFrame({'source_id':['A250','C791','716.9','493.9','143.21','134.52'],
'source_code':['test1','test2','test3','test4','test5'],
'hash_id':[911,512,713,814,616,717]})
id_file = hash_file.set_index(['source_id','source_code'])['hash_id']
I would like to update the values of the key_id
column by comparing the source_code
, source_description
columns with source_id
and source_code
columns.
So, I tried the below based on this post
df['key_id'] = df.set_index(['source_code','source_description']).index.map(id_file)
While this works fine in normal scenarios, but for specific scenarios when there is a mismatch like 250
and 250.00
or 791.0
and 791
etc, it doesn't work and produces incorrect output like below
So, I tried converting them to strings but it doesn't work still
I expect my output to be like below