I have 3 dataframes with different structures, where one contains the 2 keys to link with the other two ones:
df1 = id1 id2 df2 = id1 a b1 c1 c2 df3 = id2 a b1 b2 c1
1 1 1a 1b1 1c1 1c2 11 11a 11b1 11b2 11c1
11 2 2a 2b1 2c1 2c2 12 12a 12b1 12b2 12c1
12 3 3a 3b1 3c1 3c2 13 13a 13b1 13b2 13c1
13 14 14a 14b1 14b2 14c1
2 21 21a 21b1 21b2 21c1
21 22 22a 22b1 22b2 22c1
22 23 23a 23b1 23b2 23c1
31 31a 31b1 31b2 31c1
Then I merge df1
with df2
:
df1 = pd.merge(df1, df2, on='id1', how='left')
df1 = id1 id2 a b1 c1 c2
1 1a 1b1 1c1 1c2
11 nan nan nan nan
12 nan nan nan nan
13 nan nan nan nan
2 2a 2b1 2c1 2c2
21 nan nan nan nan
22 nan nan nan nan
But when I merge with df3
I have:
df1 = pd.merge(df1, df3, on='id2', how='left')
df1 = id1 id2 a_x b1_x c1_x c2 a_y b1_y b2 c1_y
1 1a 1b1 1c1 1c2
11 nan nan nan nan 11a 11b1 11b2 11c1
12 nan nan nan nan 12a 12b1 12b2 12c1
13 nan nan nan nan 13a 13b1 13b2 13c1
2 2a 2b1 2c1 2c2
21 nan nan nan nan 21a 21b1 21b2 21c1
22 nan nan nan nan 22a 22b1 22b2 22c1
In a nutshell, when there are overlaping columns between the dataframes being merged, the method creates a new column with the sulfixes. However, I want the values to be replaced when they are coincidents columns.
What I'm trying to get is this:
df1 = id1 id2 a b1 c1 c2 b2
1 1a 1b1 1c1 1c2
11 11a 11b1 11c1 11b2
12 12a 12b1 12c1 12b2
13 13a 13b1 13c1 13b2
2 2a 2b1 2c1 2c2
21 21a 21b1 21c1 21b2
22 22a 22b1 22c1 22b2
I also tried to fillna('')
before merging the second time, but I have the same result.