0

What would be the most efficient way of me achieving the following? I have two DataFrames and want to check if the values in DF1 exist in DF2. If they do, then I want to add another column in DF1 flagging this.

Dataframe 1:

   col1
0   1
1   2
2   3
3   4
4   5

Dataframe 2:

   col1
0   5
1   6
2   7
3   8
4   9

Desired result:

   col1  Flag
0   5     Duplicate
1   6     Non-duplicate
2   7     Non-duplicate
3   8     Non-duplicate
4   9     Non-duplicate

Thanks in advance.

Ben Swann
  • 133
  • 10

1 Answers1

4

Try using np.where with a condition of df2.col1 in df.col1, and if so, say Duplicate, otherwise say Non-duplicate:

>>> df2['Flag'] = np.where(df2.col1.isin(df.col1), 'Duplicate', 'Non-duplicate')
>>> df2
   col1           Flag
0     5      Duplicate
1     6  Non-duplicate
2     7  Non-duplicate
3     8  Non-duplicate
4     9  Non-duplicate
>>> 
U13-Forward
  • 69,221
  • 14
  • 89
  • 114