Merge multiple dataframes based on a common column

Question

I have Three dataframes. All of them have a common column and I need to merge them based on the common column without missing any data

Input

>>>df1
0 Col1  Col2  Col3
1 data1  3      4
2 data2  4      3
3 data3  2      3
4 data4  2      4
5 data5  1      4

>>>df2
0 Col1  Col4  Col5
1 data1  7      4
2 data2  6      9
3 data3  1      4

>>>df3
0 Col1  Col6  Col7
1 data2  5      8
2 data3  2      7
3 data5  5      3

Expected Output

>>>df
0 Col1  Col2  Col3  Col4 Col5  Col6  Col7
1 data1  3      4    7    4
2 data2  4      3    6    9     5     8
3 data3  2      3    1    4     2     7
4 data4  2      4
5 data5  1      4               5     3

Please show your attempts based on what you found in your research and we can explain why it didn't work as expected. — roganjosh, Sep 07 '18 at 12:57
I have done this but some of the rows are missing `dfs = [df3,df1,df2] df_final = reduce(lambda left,right: pd.merge(left,right,on='Col1'), dfs)` — FunnyCoder, Sep 07 '18 at 12:59
`df = pd.concat([df1,df2,df3],axis=1,sort=False).reset_index()` `df.rename(columns = {'index':'Col1'})` — , Jul 02 '21 at 10:56

score 51 · Accepted Answer · answered Sep 07 '18 at 13:08

Use merge and reduce

In [86]: from functools import reduce

In [87]: reduce(lambda x,y: pd.merge(x,y, on='Col1', how='outer'), [df1, df2, df3])
Out[87]:
    Col1  Col2  Col3  Col4  Col5  Col6  Col7
0  data1     3     4   7.0   4.0   NaN   NaN
1  data2     4     3   6.0   9.0   5.0   8.0
2  data3     2     3   1.0   4.0   2.0   7.0
3  data4     2     4   NaN   NaN   NaN   NaN
4  data5     1     4   NaN   NaN   5.0   3.0

Details

In [88]: df1
Out[88]:
    Col1  Col2  Col3
0  data1     3     4
1  data2     4     3
2  data3     2     3
3  data4     2     4
4  data5     1     4

In [89]: df2
Out[89]:
    Col1  Col4  Col5
0  data1     7     4
1  data2     6     9
2  data3     1     4

In [90]: df3
Out[90]:
    Col1  Col6  Col7
0  data2     5     8
1  data3     2     7
2  data5     5     3

I get new column names; The common column has the right name, but the names for the rest of the columns change to value_x, value_y, value_x ... — PM0087, Mar 23 '21 at 18:17

Space Impact · Answer 2 · 2018-09-07T13:10:36.257

20

Using pd.concat:

df1.set_index('Col1',inplace=True)
df2.set_index('Col1',inplace=True)
df3.set_index('Col1',inplace=True)
df = pd.concat([df1,df2,df3],axis=1,sort=False).reset_index()
df.rename(columns = {'index':'Col1'})

    Col1    Col2    Col3    Col4    Col5    Col6    Col7
0   data1   3       4       7.0     4.0     NaN     NaN
1   data2   4       3       6.0     9.0     5.0     8.0
2   data3   2       3       1.0     4.0     2.0     7.0
3   data4   2       4       NaN     NaN     NaN     NaN
4   data5   1       4       NaN     NaN     5.0     3.0

edited Sep 07 '18 at 13:10

answered Sep 07 '18 at 13:03

Space Impact

13,085
23
48

`Traceback (most recent call last): File "extraction.py", line 291, in df_final = pd.concat([df0,df1,df2,df3,df4,df5,df6,df7],axis=1,sort=False).reset_index(drop=True) TypeError: concat() got an unexpected keyword argument 'sort' ` – FunnyCoder Sep 07 '18 at 13:37
@FunnyCoder The error might be due to versions of `pandas` mine is `'0.23.4'`. If yours is older remove `sort=False` and try. `sort` parameter is added in `pandas=0.23.0`. – Space Impact Sep 07 '18 at 13:38
1

My version is `0.18.1.` Removed sort parameter and worked fine. – FunnyCoder Sep 10 '18 at 05:16

score 5 · Answer 3 · answered Sep 07 '18 at 12:58

5

You can do

df1.merge(df2, how='left', left_on='Col1', right_on='Col1').merge(df3, how='left', left_on='Col1', right_on='Col1')

answered Sep 07 '18 at 12:58

ignoring_gravity

6,677
4
32
65

If I have more than 3 columns, do I need to extend that chain? – FunnyCoder Sep 07 '18 at 13:03
1

Take a look at @Zero's solution for a way to do it without chaining merges explicitly – ignoring_gravity Sep 07 '18 at 13:14
Yes, I got it even @Sandeep answer is working fine – FunnyCoder Sep 07 '18 at 13:22

score 3 · Answer 4 · answered Sep 07 '18 at 12:59

Try this line of code here:

 df.set_index('key').join(df2.set_index('key'))

You can check the documentation on the 'key' to reference your code properlly. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.join.html

Set the 'key' equal to the column you wish to merge with the rest!

Hope this helps.

Merge multiple dataframes based on a common column

4 Answers4

Linked

Related