0

I'd like to merge (update?) two DataFrames on specific rows and columns.

First DataFrame A:

A   B   C   D   E
a   aa  s
b   bb  s
c   cc
d   dd
e   ee
...

Second DataFrame B:

A   C   D   E
b       s   s
d   s       s

Expected result:

A   B   C   D   E
a   aa  s
b   bb  s   s   s
c   cc
d   dd  s       s
e   ee
...

I don't remember when did I waste so much time trying to figure something out. My guess was to use:

pd.merge(A, B, on=['A', 'C', 'D', 'E'], how='left')

But it doesn't work. I can't find the help.

I'd like to point out that all the values are string, and that values don't overlap between A and B. Final DataFrame doesn't have any duplicated columns after the connection.

DavidS1992
  • 823
  • 1
  • 8
  • 19
  • This is a merge problem, but for this specific instance you can use `A.set_index('A').combine_first(B.set_index('A')).reset_index()` – cs95 Dec 14 '18 at 17:23
  • I added a note. @coldspeed The solution you proposed is leaving the values the same as DataFrame A, only the order of columns is different. In the 'duplicate' mark there's no example answer that leaves the same number of columns in the result DataFrame (usually duplicates values to values_x values_y) – DavidS1992 Dec 14 '18 at 17:30
  • ok I can see the problem. Empty values are not Nan, but ''. Converting it to Nan makes your line working – DavidS1992 Dec 14 '18 at 17:34

0 Answers0