Merge two python pandas data frames of different length but keep all rows in output data frame

Question

I have the following problem: I have two pandas data frames of different length containing some rows and columns that have common values and some that are different, like this:

df1:                                 df2:

      Column1  Column2  Column3           ColumnA  ColumnB ColumnC
    0    a        x        x            0    c        y       y
    1    c        x        x            1    e        z       z
    2    e        x        x            2    a        s       s
    3    d        x        x            3    d        f       f
    4    h        x        x
    5    k        x        x

What I want to do now is merging the two dataframes so that if ColumnA and Column1 have the same value the rows from df2 are appended to the corresponding row in df1, like this:

df1:
    Column1  Column2  Column3  ColumnB  ColumnC
  0    a        x        x        s        s
  1    c        x        x        y        y
  2    e        x        x        z        z
  3    d        x        x        f        f
  4    h        x        x        NaN      NaN
  5    k        x        x        NaN      NaN

I know that the merge is doable through

df1.merge(df2,left_on='Column1', right_on='ColumnA')

but this command drops all rows that are not the same in Column1 and ColumnA in both files. Instead of that I want to keep these rows in df1 and just assign NaN to them in the columns where other rows have a value from df2, as shown above. Is there a smooth way to do this in pandas?

Sina · Accepted Answer · 2018-01-26T03:06:12.000

60

You can read the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

What you are looking for is a left join. The default option is an inner join. You can change this behavior by passing a different how argument:

df1.merge(df2,how='left', left_on='Column1', right_on='ColumnA')

edited Jan 26 '18 at 03:06

answered Oct 12 '15 at 17:29

Sina

1,888
1
17
16

I think he is actually looking for `left` join :) – Mathiou Oct 12 '15 at 17:32
I will read the documentation, thank you for the fast answer! works fine. – sequence_hard Oct 12 '15 at 17:42

score 7 · Answer 2 · answered Oct 12 '15 at 17:31

7

Looks like you're looking for something like a left-join. See if this example helps: http://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html#left-outer-join

You can basically pass a parameter to merge() called how='left'

answered Oct 12 '15 at 17:31

sgrg

1,210
9
15

score 5 · Answer 3 · edited Jul 31 '17 at 05:00

5

You can simply use merge with using on and list as well

result = df1.merge(df2, on=['Column1'])

For more information follow link

edited Jul 31 '17 at 05:00

Rohit Poudel

1,793
2
20
24

answered Jul 31 '17 at 04:33

Nirali Khoda

388
5
19

score 2 · Answer 4 · answered Jun 06 '22 at 08:35

How about using "concat"?

Dataframe column contents no need to be the same/matched, it will append.

import pandas as pd
from io import StringIO

csvfile = StringIO(
"""Column1  Column2 Column3
a   x   x
c   x   x
e   x   x
d   x   x
h   x   x
k   x   x
""")
    
csvfile_1 = StringIO(
"""ColumnA  ColumnB ColumnC
c   y   y
e   z   z
a   s   s
d   f   f
""")

df = pd.read_csv(csvfile, sep = '\t', engine='python')
df_1 = pd.read_csv(csvfile_1, sep = '\t', engine='python')

df_1 = df_1.rename({'ColumnA':'Column1'}, axis='columns')
    
df.set_index('Column1',inplace=True)
df_1.set_index('Column1',inplace=True)
    
# column contents no need to be the same, it will append
df_final = pd.concat([df,df_1],axis=1,sort=False).reset_index()

print (df_final)

Output as:

  index Column2 Column3 ColumnB ColumnC
0     a       x       x       s       s
1     c       x       x       y       y
2     e       x       x       z       z
3     d       x       x       f       f
4     h       x       x     NaN     NaN
5     k       x       x     NaN     NaN

Merge two python pandas data frames of different length but keep all rows in output data frame

4 Answers4

Linked