0

I have three pandas dataframes: df1, df2, df3 which looks like as follows:

df1=

X Y M1_x M2_x M3_x
12 33 3 4 2
12 54 0 3 4
23 12 0 8 3

df2=

X Y M1_y M2_y M3_y
12 33 9 4 1
12 54 0 3 5
12 11 0 2 1

df3=

X Y M1_z M2_z M3_z
12 33 1 40 10
11 10 10 30 0
12 11 0 40 5

I like to concatenate the two df and get the merged dataframe as below:

result =

X Y M1_x M2_x M3_x M1_y M2_y M3_y M1_z M2_z M3_z
12 33 3 4 2 9 4 1 1 40 10
12 54 0 3 4 0 3 5 nan nan nan
23 12 0 8 3 nan nan nan nan nan nan
12 11 nan nan nan 0 2 1 0 40 5
11 10 nan nan nan nan nan nan 10 30 0

I already tried p.cocatenate function, the problem is choosing axis=1, doesn't merge the[X, Y] columns, then I have double [X,Y] columns in the result file, choosing axis=0, doesn't merge the rows, and I get the union of rows instead of merging common [X,Y] with eachother. How can I make this happen?

EDIT: I know how to use merge function when merging two dataframes. My problem here is I have more than two (actually 4) databases to merge. Is there any function to combine more than two dfs in one line?

2 Answers2

2

Use pandas.DataFrame.merge specifying how to merge (outer merge):

>>> df1.merge(df2, 'outer').merge(df3, 'outer')
   X   Y   M1_x   M2_x   M3_x  M1_y   M2_y   M3_y  M1_z   M2_z   M3_z
0  12  33    3.0    4.0   2.0    9.0    4.0   1.0    1.0   40.0  10.0
1  12  54    0.0    3.0   4.0    0.0    3.0   5.0    NaN    NaN   NaN
2  23  12    0.0    8.0   3.0    NaN    NaN   NaN    NaN    NaN   NaN
3  12  11    NaN    NaN   NaN    0.0    2.0   1.0    0.0   40.0   5.0
4  11  10    NaN    NaN   NaN    NaN    NaN   NaN   10.0   30.0   0.0

If you have more dataframes you can do something like this:

out = df1.copy()
dfs = [df2,df3,df4,df5,df6]
for df in dfs:
    out = out.merge(df, 'outer')

This diagram clarifies the different types of merge (pandas uses inner merge as default): Types of merge

Pablo C
  • 4,661
  • 2
  • 8
  • 24
  • Thanks for your reply. Well, I wanted to summerize my question and reduce the number of dfs to two. My real problem is dealing with 4 dfs. Is it still possible to use your solution? – maede nasri Apr 06 '21 at 08:48
  • @maedenasri of course, you only need to "chain" merges or looping through a list of yours dataframes while merging :) – Pablo C Apr 06 '21 at 08:50
0

Check if this can help.

new_df = pd.merge(df1, df2,  how='outer')

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html