3

I am trying to join (merge) two dataframes based on values in each column. For instance, to merge by values in columns in A and B. So, having df1

    A  B  C  D  L
0   4  3  1  5  1
1   5  7  0  3  2
2   3  2  1  6  4

And df2

    A  B  E  F  L
0   4  3  4  5  1 
1   5  7  3  3  2 
2   3  8  5  5  5

I want to get a d3 with such structure

    A     B     C     D     E     F     L
0   4     3     1     5     4     5     1
1   5     7     0     3     3     3     2
2   3     2     1     6     Nan   Nan   4
3   3     8     Nan   Nan   5     5     5

Can you, please help me? I've tried both merge and join methods but havent succeed.

Thomas K
  • 39,200
  • 7
  • 84
  • 86
dbulgakov
  • 81
  • 7

1 Answers1

3

UPDATE: (for updated DFs and new desired DF)

In [286]: merged = pd.merge(df1, df2, on=['A','B'], how='outer', suffixes=('','_y'))

In [287]: merged.L.fillna(merged.pop('L_y'), inplace=True)

In [288]: merged
Out[288]:
   A  B    C    D    L    E    F
0  4  3  1.0  5.0  1.0  4.0  5.0
1  5  7  0.0  3.0  2.0  3.0  3.0
2  3  2  1.0  6.0  4.0  NaN  NaN
3  3  8  NaN  NaN  5.0  5.0  5.0

Data:

In [284]: df1
Out[284]:
   A  B  C  D  L
0  4  3  1  5  1
1  5  7  0  3  2
2  3  2  1  6  4

In [285]: df2
Out[285]:
   A  B  E  F  L
0  4  3  4  5  1
1  5  7  3  3  2
2  3  8  5  5  5

OLD answer:

you can use pd.merge(..., how='outer') method:

In [193]: pd.merge(a,b, on=['A','B'], how='outer')
Out[193]:
   A  B    C    D    E    F
0  4  3  1.0  5.0  4.0  5.0
1  5  7  0.0  3.0  3.0  3.0
2  3  2  1.0  6.0  NaN  NaN
3  3  8  NaN  NaN  5.0  5.0

Data:

In [194]: a
Out[194]:
   A  B  C  D
0  4  3  1  5
1  5  7  0  3
2  3  2  1  6

In [195]: b
Out[195]:
   A  B  E  F
0  4  3  4  5
1  5  7  3  3
2  3  8  5  5
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thanks for your solution. But, unfortunately, It works perfectly on sample data I've provided, but doesn't work on real data. So, in case of using it with dataframes, that have values that are not in "on merge" list, are missed. Also merge in this way creates other colls with _x or _y suffixes. Maybe there is any other way to do that? – dbulgakov Oct 24 '16 at 15:36
  • There are many ways to do that - for example using `join` or `concat` or even `map` functions. But in order to be able to give you a qualified answer we need a reproducible data sets, which would help us to understand your issue... Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) - it's not easy, but it's very efficient – MaxU - stand with Ukraine Oct 24 '16 at 15:40
  • Sorry for that. Now example, that fully represents situation. So, having also column L I want to get same output – dbulgakov Oct 24 '16 at 15:56
  • 1
    d = {'A':[4, 5, 4], 'B':[3, 7, 2], 'L':[1, 2, 3],'C':[1, 0, 1], 'D':[5, 3, 6]} d1 = {'A':[4, 5, 3], 'B':[3, 7, 8], 'L':[1, 2, 3],'E':[4, 3, 5], 'F':[5, 3, 5]} df1 = pd.DataFrame(d) df2 = pd.DataFrame(d1) pd.merge(df1, df2, on=['A','B'], how='outer') with such code l am also getting L_x, L_y colls. But supposed output is save it's state – dbulgakov Oct 24 '16 at 15:58
  • @dbulgakov, would you please also [post](http://stackoverflow.com/posts/40221516/edit) __desired__ data set? – MaxU - stand with Ukraine Oct 24 '16 at 16:03
  • I've added desired output. – dbulgakov Oct 24 '16 at 16:23
  • @dbulgakov, now i'm confused. Where did values `4` and `5` in the `L` column come from? – MaxU - stand with Ukraine Oct 24 '16 at 17:04
  • 4 goes from line 2 (df1) -- 3 2 1 6 4(L) and 5 goes from line 2 (df2) -- 3 8 5 5 5(L) – dbulgakov Oct 24 '16 at 17:21
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/126542/discussion-between-dbulgakov-and-maxu). – dbulgakov Oct 24 '16 at 17:22