how to use map function for multiindex dataframe using pandas?

Question

I have a data frame like as shown below

df = pd.DataFrame({'source_code':['11','11','12','13','14',np.nan],
                   'source_description':['test1', 'test1','test2','test3',np.nan,'test5'],
                   'key_id':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})

I also have a hash_file data frame like as shown below

hash_file = pd.DataFrame({'source_id':['11','12','13','14','15'],
                          'source_code':['test1','test2','test3','test4','test5'],
                          'hash_id':[911,512,713,814,616]})
id_file =  hash_file.set_index(['source_id','source_code'])['hash_id']

There will be no duplicates in the id_file (source_id, source_code) will always be unique

Now, I would like to fill in the key_id column in df based on matching entries of source_code, source_description with source_id and source_code columns from hash_file.

So, I tried the below

df['key_id'] = df['source_code','source_description'].map(id_file)

It threw an error

KeyError: ('source_code', 'source_description')

So, I tried another approach below

df['key_id'] = df[['source_code','source_description']].map(id_file)

It threw another error

AttributeError: 'DataFrame' object has no attribute 'map'

So, I expect my output to be like as shown below. Please note that there might be NA in between and it has to be case-insensitive. Meaning the comparison of indices in the id_file with the columns in the df has to be case-insensitive.

I would like to do only with map approach. Any other elegant approach is also welcome

source_code source_description  key_id
11            test1              911
11            test1              911
12            test2              512
13            test3              713
14             NaN               814
NaN           test5              616

Meaning `source_id` or `source_code` column (either of the one column) can be `NA`.. — The Great, Mar 28 '21 at 13:44
I updated the output. The code should compare both the columns to fetch the key_ids. If one of those column is `NA`, then it should look at the other and try to find a match based on it — The Great, Mar 28 '21 at 13:52
I prefer `map` over merge because for `single` columns, it works fine and just one line of code.. easy to understand for non-programmer as well..I want to do the same `map` for multiple key columns.. Hence map over merge — The Great, Mar 28 '21 at 13:57
Are the values in source_code columns in `hash_file` always unique? — Shubham Sharma, Mar 28 '21 at 14:09
the combination is always unique. Meaning, `source_code` and `source_id` when combined will be unique — The Great, Mar 28 '21 at 14:16
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/230477/discussion-between-shubham-sharma-and-the-great). — Shubham Sharma, Mar 28 '21 at 14:18
@Vaishali Its not a dupe because the index contains `NaN` values plus there should be case insensitive mapping. — Shubham Sharma, Mar 28 '21 at 15:20

piterbarg · Accepted Answer · 2021-03-28T14:01:49.623

This seems to be a fairly standard merge with some renaming:

(df.merge(hash_file, left_on = ['source_code','source_description'], right_on = ['source_id','source_code'])
    .drop(columns = ['key_id','source_id','source_code_y'])
    .rename(columns = {'source_code_x':'source_code','hash_id':'key_id'})
)

output


    source_code source_description  key_id
0   11          test1               911
1   11          test1               911
2   12          test2               512
3   13          test3               713

Using `map` (for updated input values in the question)

df['key_id'] = df.set_index(['source_code','source_description']).index.map(id_file)

output

    source_code source_description  key_id
0   11          test1               911.0
1   11          test1               911.0
2   12          test2               512.0
3   13          test3               713.0
4   14          NaN                 NaN
5   NaN         test5               NaN

Thanks @piterbarg. But I would like to do it via `map` instead of `merge` — The Great, Mar 28 '21 at 13:53

how to use map function for multiindex dataframe using pandas?

1 Answers1

Using `map` (for updated input values in the question)

Linked

how to use map function for multiindex dataframe using pandas?

1 Answers1

Using map (for updated input values in the question)

Linked

Using `map` (for updated input values in the question)