Concat dataframes using rows from second dataframe which don't exist in first dataframe based on subset of columns

Question

I have 2 large dataframes as follows:

df1

C1  C2  C3  V1
a   e   k   1
b   f   j   2
d   g   h   3

df2

C1  C2  C3  V1
a   e   m   21
d   g   p   1
e   f   q   2
f   g   r   3

I want to get the following:

C1  C2  C3  V1
a   e   k   1
b   f   j   2
d   g   h   3
e   f   q   2
f   g   r   3

Where I have only included rows from df2 where C1 and C2 are not present in df1. i.e. I dropped first row of df2 as a, e was already present in df1.

I am trying:

df1['id'] = df1[['C1', 'C2']].astype(str).apply('_'.join, axis=1)
df2['id'] = df2[['C1', 'C2']].astype(str).apply('_'.join, axis=1)
df3 = df2[~df2['id'].isin(df1['id'])]
pd.concat([df1, df3])

Is there a better way to do this using some inbuilt function as I run into this kind of situation a lot with varying number of columns.

You probably want to set ['C1','C2'] to be your index, (actually a multiindex). Then see [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) — smci, Apr 06 '22 at 12:58

score 1 · Answer 1 · answered Apr 06 '22 at 12:58

You can use combine_first:

(df1
 .set_index(['C1', 'C2'])
 .combine_first(df2.set_index(['C1', 'C2']))
 .reset_index()
)

output:

  C1 C2 C3  V1
0  a  e  k   1
1  b  f  j   2
2  d  g  h   3
3  e  f  q   2
4  f  g  r   3

using merge

This is a bit more complex, you need to perform a negative merge with help of indicator=True:

cols = ['C1', 'C2']
df2_only = (df1[cols]
            .merge(df2, on=cols, indicator=True, how='right')
            .query('_merge == "right_only"').drop(columns='_merge')
           )

out = pd.concat([df1, df2_only])

wwnde · Accepted Answer · 2022-04-06T13:08:12.843

1

append df2 to df1 and drop duplicates

new=df1.append(df2).drop_duplicates(subset=['C1','C2'],keep='first')

print(new)

 C1 C2 C3  V1
0  a  e  k   1
1  b  f  j   2
2  d  g  h   3
3  e  f  q   2
4  f  g  r   3

edited Apr 06 '22 at 13:08

answered Apr 06 '22 at 12:59

wwnde

26,119
6
18
32

you're missing a few rows ;) Also this would imply that the index is the same in both dataframes – mozway Apr 06 '22 at 13:01
in your [previous](https://stackoverflow.com/revisions/71767232/1) (incorrect) answer :p – mozway Apr 06 '22 at 13:07

Concat dataframes using rows from second dataframe which don't exist in first dataframe based on subset of columns

2 Answers2

using merge