I have a data frame like as shown below
df = pd.DataFrame({'source_code':['11','11','12','13','14',np.nan],
'source_description':['test1', 'test1','test2','test3',np.nan,'test5'],
'key_id':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})
I also have a hash_file data frame like as shown below
hash_file = pd.DataFrame({'source_id':['11','12','13','14','15'],
'source_code':['test1','test2','test3','test4','test5'],
'hash_id':[911,512,713,814,616]})
id_file = hash_file.set_index(['source_id','source_code'])['hash_id']
There will be no duplicates in the id_file
(source_id, source_code) will always be unique
Now, I would like to fill in the key_id
column in df
based on matching entries of source_code
, source_description
with source_id
and source_code
columns from hash_file
.
So, I tried the below
df['key_id'] = df['source_code','source_description'].map(id_file)
It threw an error
KeyError: ('source_code', 'source_description')
So, I tried another approach below
df['key_id'] = df[['source_code','source_description']].map(id_file)
It threw another error
AttributeError: 'DataFrame' object has no attribute 'map'
So, I expect my output to be like as shown below. Please note that there might be NA
in between and it has to be case-insensitive. Meaning the comparison of indices in the id_file
with the columns in the df
has to be case-insensitive.
I would like to do only with map
approach. Any other elegant approach is also welcome
source_code source_description key_id
11 test1 911
11 test1 911
12 test2 512
13 test3 713
14 NaN 814
NaN test5 616