I have two dataframes:
df = pd.DataFrame({"ID":[111,111,111,222,222,333,333],
"Name":['aaa','xxx','yyy','bbb','xxx','ccc','yyy']})
region = pd.DataFrame({"ID":[111,222,333,111,222,111],
"Name":['aaa','bbb','yyy','xxx','yyy','yyy'],
"Region": ['west','north','east','west','north','south']})
which produce the following tables:
df:
ID Name
0 111 aaa
1 111 xxx
2 111 yyy
3 222 bbb
4 222 xxx
5 333 ccc
6 333 yyy
region:
ID Name Region
0 111 aaa west
1 222 bbb north
2 333 yyy east
3 111 xxx west
4 222 yyy north
5 111 yyy south
Essentially I want to map the Region column in region
to the data in df
. But as you can see both ID and Name have duplicate values. Therefore what I want to do is to first group the data based on ID, and then map the names one group at a time (there should not be any duplicate Names per ID group). For example, if we focus on ID 111, we get the following subsets:
df:
ID Name
0 111 aaa
1 111 xxx
2 111 yyy
region:
ID Name Region
0 111 aaa west
3 111 xxx west
5 111 yyy south
And now we can map Region from region
to the data in df
.
So far the only solution I have come up with is to manually subset df
based on the ID groupings, perform the matching, and then concat the individual tables. But my actual dataset contains more than 100K records, which makes it a very slow task. Is there a way to map the values using the above logic but in one execution?