1

I have a 4 df's:

df1
   a  b
1  0  3
2  1  4
df2
   a  b
1  0  5
2  0  6
3  1  7
df3
   a  b
1  0  2
2  1  6  
3  1  5
...

Within groups of 'a' I want to merge all 4 df's on a and keep all values by putting them in a further column. The merge of df1 and df2 should look like:

   a  b1 b2
1  0  3  5
2  0  3  6
3  1  4  7

Merge of df1, df2, df3:

   a  b1 b2 b3
1  0  3  5  2
2  0  3  6  2
3  1  4  7  6
4  1  4  7  5

I tried:

df1.assign(dummy=1).merge(df2.assign(dummy=1), on='dummy', how='outer').drop('dummy', axis=1)

but this is ignoring the groups and 'a' disappears.

MirekG
  • 161
  • 6

3 Answers3

1

This is not Cartesian product, but a simple merge across multiple dataframes.

Try this:

In [846]: df1.merge(df2, on='a').merge(df3, on='a').rename(columns={'b_x':'b1', 'b_y':'b2', 'b':'b3'})

Out[846]: 
   a  b1  b2  b3
0  0   3   5   2
1  0   3   6   2
2  1   4   7   6
3  1   4   7   5

OR, if your dataframes can increase you can do this:

In [851]: from functools import reduce    
In [852]: reduce(lambda x,y: pd.merge(x,y, on='a'), [df1, df2, df3])    
Out[852]: 
   a  b_x  b_y  b
0  0    3    5  2
1  0    3    6  2
2  1    4    7  6
3  1    4    7  5
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
0

Modify the b column name before merging, then use reduce to expand to an arbitrary number of dataframes.

from functools import reduce    

dfs =  [df.rename(columns={'b':f'b{num+1}'}) for num, df in enumerate([df1, df2, df3])]

reduce(lambda x,y: pd.merge(x,y), dfs)    

Note that by default, pd.merge on shared columns, hence a.

Mark Wang
  • 2,623
  • 7
  • 15
0

Use pd.DataFrame.join:

First set the index of each dataframe to 'a', you can use list comprehension to do this in place like this.

[i.set_index('a', inplace=True) for i in [df1, df2, df3]]

Next, use join:

df1.join([df2, df3])

Output:

   a  b_x  b_y  b
0  0    3    5  2
1  0    3    6  2
2  1    4    7  6
3  1    4    7  5
Scott Boston
  • 147,308
  • 15
  • 139
  • 187