Combine Dataframes in Python with same and different Column Names

Question

Hello I have 2 dataframes I want to combine

dataframe 1 :

ID	A	B	C
row1	1	2	3
row2	4	5	6

dataframe 2:

ID	A	B	D
row1	6	7	8

and I want them to merge and replace values of the same row to the values on dataframe 2 like this:

ID	A	B	C	D
row1	6	7	3	8
row2	4	5	6	null

how do I do this? I tried merging and concatenation but it doesn't seem to work. Thank you

perhaps this one may help ? https://stackoverflow.com/a/43735493/3270433 — FerdousTheWebCoder, Jul 06 '22 at 15:35
@PrakashDahal this is just a sample, I need to use the code for much larger data that is always changing that need to replace data on rows and columns that already exist and add columns that aren't on the original dataframe — woops, Jul 06 '22 at 15:36

Corralien · Answer 1 · 2022-07-07T04:29:02.947

4

Another method to merge your 2 dataframes:

>>> pd.concat([df1, df2]).groupby('ID').last().reset_index()
     ID  A  B    C    D
0  row1  6  7  3.0  8.0
1  row2  4  5  6.0  NaN

Solution enhanced by @PierreD:

This assumes ID is not the index, however (if it is, then it is lost). If you reformulate as pd.concat([df1, df2]).groupby('ID').last(), then it works in both cases, and makes ID the index. You can of course then .reset_index() if that's not desired.

edited Jul 07 '22 at 04:29

answered Jul 06 '22 at 15:56

Corralien

109,409
8
28
52

1

Nice, clean and going from first principles. This assumes `ID` is **not** the index, however (if it is, then it is lost). If you reformulate as `pd.concat([df1, df2]).groupby('ID').last()`, then it works in both cases, and makes `ID` the index. You can of course then `.reset_index()` if that's not desired. – Pierre D Jul 06 '22 at 16:21
Thanks @PierreD for your comment. I updated my solution with your suggestion. – Corralien Jul 07 '22 at 04:29

Pierre D · Answer 2 · 2022-07-06T16:14:55.430

2

Assuming ID is the index in both DataFrames (if not, make it so): There is actually a function combine_first():

out = df2.combine_first(df1)
>>> out
      A  B  C    D
ID                
row1  6  7  3  8.0
row2  4  5  6  NaN

Notes:

why is column D of type float? Because of that NaN.
what if the rows are in different order, e.g. df1 has row2 first and then row1? Not a problem at all and the result is exactly the same as above (with rows sorted). Tested with pandas=1.4.2 and also pandas=1.3.4.

edited Jul 06 '22 at 16:14

answered Jul 06 '22 at 15:42

Pierre D

24,012
7
60
96

This might not work if the rows are interchanged in 1st df? – Prakash Dahal Jul 06 '22 at 15:46
@PrakashDahal: yes it works regardless of the respective ordering of rows in either `df`. – Pierre D Jul 06 '22 at 15:48
@PierreD have you checked it in notebook? It did not work if the row1 and row2 placed are interchanged in df1 – Prakash Dahal Jul 06 '22 at 15:50
@PrakashDahal Yes. I always check my answers, or if not possible I clearly indicate so. It works with `pandas=1.4.2`, perhaps you have an older version? If you find a version where that didn't work, would you kindly indicate so (and try to find the first where it does work correclty)? – Pierre D Jul 06 '22 at 15:54
In first df, `row2 4 5 6` `row1 1 2 3`, In 2nd df, `row1 6 7 8`, is this still working? My version is `1.3.5` – Prakash Dahal Jul 06 '22 at 16:12
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/246206/discussion-between-pierre-d-and-prakash-dahal). – Pierre D Jul 06 '22 at 16:15

Combine Dataframes in Python with same and different Column Names

2 Answers2