I have two dataframes, df1
and df2
. The first of these dataframes tracks the locations (i.e., ZIP codes) of specific individuals at different time points:
ID | ZIP 1 | ZIP 2 | ZIP 3 |
---|---|---|---|
1 | 55333 | N/A | 55316 |
2 | 55114 | 27265 | 27265 |
3 | 55744 | 55744 | N/A |
The second dataframe contains several columns of data pertaining to every ZIP code in the country (many of which do not appear in df1
):
ZIP | State | Tier |
---|---|---|
01001 | MA | 1 |
... | ... | ... |
27265 | NC | 2 |
55114 | MN | 4 |
55316 | MN | 7 |
55333 | MN | 5 |
55744 | MN | 3 |
I would like to merge these dataframes and append the variable names from df2
to the ends of the corresponding ZIP/time point variable in df1
to end up with something like this (Note: I removed the ZIP 3 variable for simplicity; I'd still like to append the relevant State and Tier data, as shown for ZIP 1 and ZIP 2):
ID | ZIP 1 | ZIP 2 | ZIP 1 State | ZIP 2 State | ZIP 1 Tier | ZIP 2 Tier |
---|---|---|---|---|---|---|
1 | 55333 | N/A | MN | N/A | 5 | N/A |
2 | 55114 | 27265 | MN | NC | 4 | 2 |
3 | 55744 | 55744 | MN | MN | 3 | 3 |
The closest solution I have come up with is to create multiple "merged" dataframes by merging on each individual ZIP code variable in df1
. This is obviously less than ideal, and does not resolve the variable naming issue either.
merged = pd.merge(df1, df2, left_on = 'ZIP 1', right_on = 'ZIP', how = 'left')
merged2 = pd.merge(df1, df2, left_on = 'ZIP 2', right_on = 'ZIP', how = 'left')
merged3 = pd.merge(df1, df2, left_on = 'ZIP 3', right_on = 'ZIP', how = 'left')
Any guidance would be much appreciated! :-)