Merging multiple data frames causing duplicate column names

Question

This thread here suggested to use reduce to merge multiple data frames at once.

df1= pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})
df2= pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})
df3= pd.DataFrame({'key': ['A', 'C', 'E', 'F'], 'value': np.random.randn(4)})
df4= pd.DataFrame({'key': ['A', 'B', 'C', 'F'], 'value': np.random.randn(4)})

df_list = [df1, df2, df3, df4]

from functools import reduce
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['key'], how='outer'), df_list )

The merge is executed, but there is a warning

<stdin>:1: FutureWarning: Passing 'suffixes' which cause duplicate columns {'value_x'} in the result is deprecated and will raise a MergeError in a future version.

and df_meged has columns with exactly duplicated names. How do I force distinct names for each column?

score 2 · Accepted Answer · answered Dec 29 '21 at 21:24

You can do

s = pd.concat([x.set_index('key') for x in df_list],axis = 1,keys=range(len(df_list)))
s.columns = s.columns.map('{0[1]}_{0[0]}'.format)
s = s.reset_index()
s
Out[236]: 
  key   value_0   value_1   value_2   value_3
0   A -1.957968       NaN -0.852135 -0.976960
1   B  1.545932 -0.276838       NaN  0.197615
2   C -2.149727       NaN -0.364382  0.349993
3   D  0.524990 -0.476655       NaN       NaN
4   E       NaN -2.135870  0.798782       NaN
5   F       NaN  1.456544 -0.255705  0.447279

score 1 · Answer 2 · answered Dec 29 '21 at 21:20

1

An outer join in this scenario is tantamount to appending on the zero axis. You therefore are better of using the two columns in the dfs as the join keys. code below

from functools import reduce
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['key', 'value'], how='outer'), df_list )

answered Dec 29 '21 at 21:20

wwnde

26,119
6
18
32

Thanks for the solution. I need to keep the columns separated. Also, in my real situation, there are more columns in each data frames. I think BENY's solution is generic covering that situation. – Tristan Tran Dec 30 '21 at 15:49

Merging multiple data frames causing duplicate column names

2 Answers2