1

This thread here suggested to use reduce to merge multiple data frames at once.

df1= pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})
df2= pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})
df3= pd.DataFrame({'key': ['A', 'C', 'E', 'F'], 'value': np.random.randn(4)})
df4= pd.DataFrame({'key': ['A', 'B', 'C', 'F'], 'value': np.random.randn(4)})

df_list = [df1, df2, df3, df4]

from functools import reduce
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['key'], how='outer'), df_list )

The merge is executed, but there is a warning

<stdin>:1: FutureWarning: Passing 'suffixes' which cause duplicate columns {'value_x'} in the result is deprecated and will raise a MergeError in a future version.

and df_meged has columns with exactly duplicated names. How do I force distinct names for each column?

Tristan Tran
  • 1,351
  • 1
  • 10
  • 36

2 Answers2

2

You can do

s = pd.concat([x.set_index('key') for x in df_list],axis = 1,keys=range(len(df_list)))
s.columns = s.columns.map('{0[1]}_{0[0]}'.format)
s = s.reset_index()
s
Out[236]: 
  key   value_0   value_1   value_2   value_3
0   A -1.957968       NaN -0.852135 -0.976960
1   B  1.545932 -0.276838       NaN  0.197615
2   C -2.149727       NaN -0.364382  0.349993
3   D  0.524990 -0.476655       NaN       NaN
4   E       NaN -2.135870  0.798782       NaN
5   F       NaN  1.456544 -0.255705  0.447279
BENY
  • 317,841
  • 20
  • 164
  • 234
1

An outer join in this scenario is tantamount to appending on the zero axis. You therefore are better of using the two columns in the dfs as the join keys. code below

from functools import reduce
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['key', 'value'], how='outer'), df_list )
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • Thanks for the solution. I need to keep the columns separated. Also, in my real situation, there are more columns in each data frames. I think BENY's solution is generic covering that situation. – Tristan Tran Dec 30 '21 at 15:49