Matching two dataframe columns but only on subset of data based on criteria

Question

I have two dataframes:

df = pd.DataFrame({"ID":[111,111,111,222,222,333,333],
                   "Name":['aaa','xxx','yyy','bbb','xxx','ccc','yyy']})

region = pd.DataFrame({"ID":[111,222,333,111,222,111],
                       "Name":['aaa','bbb','yyy','xxx','yyy','yyy'],
                       "Region": ['west','north','east','west','north','south']})

which produce the following tables:

df:
    ID  Name
0   111 aaa
1   111 xxx
2   111 yyy
3   222 bbb
4   222 xxx
5   333 ccc
6   333 yyy

region:
    ID  Name    Region
0   111 aaa     west
1   222 bbb     north
2   333 yyy     east
3   111 xxx     west
4   222 yyy     north
5   111 yyy     south

Essentially I want to map the Region column in region to the data in df. But as you can see both ID and Name have duplicate values. Therefore what I want to do is to first group the data based on ID, and then map the names one group at a time (there should not be any duplicate Names per ID group). For example, if we focus on ID 111, we get the following subsets:

df:
    ID  Name
0   111 aaa
1   111 xxx
2   111 yyy

region:
    ID  Name    Region
0   111 aaa     west
3   111 xxx     west
5   111 yyy     south

And now we can map Region from region to the data in df.

So far the only solution I have come up with is to manually subset df based on the ID groupings, perform the matching, and then concat the individual tables. But my actual dataset contains more than 100K records, which makes it a very slow task. Is there a way to map the values using the above logic but in one execution?

Just merge on ID and Name? `df.merge(region,on=['ID','Name'],how='left')` — anky, Sep 17 '21 at 15:52
Thanks this works. I'm new to Pandas, and now I think this might actually be a duplicate question based on the solution. — sudden_clarity_clarence, Sep 17 '21 at 15:57
Yes . Actually it is :-) I will close it with a an answer you should read — anky, Sep 17 '21 at 15:58

Matching two dataframe columns but only on subset of data based on criteria

0 Answers0