Merge multiple dataframes in pandas

Question

I have two data frames: df1 as follows:

  col0 col1  col1.1  col3
0    a    d       1     6
1    b    e       5     7

And df2 as follows:

  colx  coly
0    a    10
1    b    20
2    d    50
3    e    40

How do I combine the dataframes in-place such that the final df looks like this?

  col0 col1  col1.1  col3  colx coly
0    a    d       1     6   10   50
1    b    e       5     7   20   40

As far as I understand, the merge method in pandas merges both dataframes based on a given axis. In this case, I want to merge them based on the value in another dataframe. What is the function I am looking for in this case?

You will not be able to get an "in place" solution. – lmo Aug 26 '19 at 14:13 — lmo, Aug 26 '19 at 14:13

score 4 · Accepted Answer · answered Aug 26 '19 at 14:11

4

Using merge chained twice:

mrg = df1.merge(df2, left_on='col0', right_on='colx')\
         .merge(df2, left_on='col1', right_on='colx')\
         .drop(columns=['colx_x', 'colx_y'])

Output

  col0 col1  col1.1  col3  coly_x  coly_y
0    a    d       1     6      10      50
1    b    e       5     7      20      40

answered Aug 26 '19 at 14:11

Erfan

40,971
8
66
78

I get the following error: `ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat` – tandem Aug 26 '19 at 14:20
Are you merging on the correct columns, so the arguments `left_on` and `right_on`. Seems like you are selecting an object/string column and a numeric/float column – Erfan Aug 26 '19 at 14:22
1

I made a mistake. my bad. apologies – tandem Aug 26 '19 at 14:22

score 3 · Answer 2 · answered Aug 26 '19 at 14:16

You could use map:

mapper = df2.set_index('colx')['coly']
df1['colx'] = df1['col0'].map(mapper)
df1['coly'] = df1['col1'].map(mapper)

print(df1)

Output:

  col0 col1  col1.1  col3  colx  coly
0    a    d       1     6    10    50
1    b    e       5     7    20    40

score 2 · Answer 3 · answered Aug 26 '19 at 14:13

2

Using applymap and get:

s=df2.set_index('colx')['coly']
df1[['colx','coly']]=df1.iloc[:,:2].applymap(lambda x: s.get(x))
print(df1)

  col0 col1  col1.1  col3  colx  coly
0    a    d       1     6    10    50
1    b    e       5     7    20    40

answered Aug 26 '19 at 14:13

anky

74,114
11
41
70

score 1 · Answer 4 · answered Aug 26 '19 at 14:16

Another solution with replace:

dct = df2.set_index('colx')['coly'].to_dict()
df1['coly_x'] = df1['col0'].replace(dct)
df1['coly_y'] = df1['col1'].replace(dct)

Output:

  col0 col1  col1.1  col3 coly_x  coly_y
0    a    d       1     6     10      50
1    b    e       5     7     20      40

score 0 · Answer 5 · answered Aug 26 '19 at 14:25

Here is a method to join on the indexes:

df1.join([df2.iloc[0:2,1].rename('colx'),
          df2.iloc[2:,1].rename('coly').reset_index(drop=True)])

Elements of the desired column are extracted, subset, and renamed. The index of the second series is reset to match. Then these are joined to df1.

this returns

  col0 col1  col1.1  col3  colx  coly
0    a    d       1     6    10    50
1    b    e       5     7    20    40

Merge multiple dataframes in pandas

5 Answers5