0

I have two dataframes:

df = pd.DataFrame({"ID":[111,111,111,222,222,333,333],
                   "Name":['aaa','xxx','yyy','bbb','xxx','ccc','yyy']})

region = pd.DataFrame({"ID":[111,222,333,111,222,111],
                       "Name":['aaa','bbb','yyy','xxx','yyy','yyy'],
                       "Region": ['west','north','east','west','north','south']})

which produce the following tables:

df:
    ID  Name
0   111 aaa
1   111 xxx
2   111 yyy
3   222 bbb
4   222 xxx
5   333 ccc
6   333 yyy
region:
    ID  Name    Region
0   111 aaa     west
1   222 bbb     north
2   333 yyy     east
3   111 xxx     west
4   222 yyy     north
5   111 yyy     south

Essentially I want to map the Region column in region to the data in df. But as you can see both ID and Name have duplicate values. Therefore what I want to do is to first group the data based on ID, and then map the names one group at a time (there should not be any duplicate Names per ID group). For example, if we focus on ID 111, we get the following subsets:

df:
    ID  Name
0   111 aaa
1   111 xxx
2   111 yyy

region:
    ID  Name    Region
0   111 aaa     west
3   111 xxx     west
5   111 yyy     south

And now we can map Region from region to the data in df.

So far the only solution I have come up with is to manually subset df based on the ID groupings, perform the matching, and then concat the individual tables. But my actual dataset contains more than 100K records, which makes it a very slow task. Is there a way to map the values using the above logic but in one execution?

0 Answers0