Remove rows if value exists in other df

Question

I Have Two Dataframes df and ds import pandas as pd

ds = pd.DataFrame({'Price': {0: 1200, 1: 1400, 2: 1500,
                                 3: 1800},
                   'Time(s)': {0: 500, 1: 500, 2: 600,
                                 3: 500},
                   'id': {0: 'a01', 1: 'a02', 2: 'a03',
                          3: 'a04'}})

df = pd.DataFrame({'Price': {0: 1200, 1: 1500, 2: 1450,
                                 3: 1800, 4: 1200},
                   'Time(s)': {0: 500, 1: 500, 2: 500,
                                 3: 500, 4: 500},
                   'id': {0: 'a01', 1: 'a02', 2: 'a03',
                          3: 'a04', 4: 'a05 '}})

ds Output:

   Price        Time(s)     id   
   1200         500         a01 
   1400         500         a02
   1500         600         a03
   1800         500         a04

df Output:

   Price        Time(s)     id  
   1200         500         a01 
   1500         500         a03
   1450         500         a02
   1800         500         a04
   1200         500         a05

Desired Output: df1:

   Price        Time(s)     id  
   1200         500         a05

I want to compare them, anf if there same values in id column in both datframes drop them, and outcome save into new dataframe df1, i tried to:

df1 = pd.concat([ds,df]) and

df1.drop_duplicates(subset='id',keep='last') But The Values were the same

Does this answer your question? [Drop all duplicate rows across multiple columns in Python Pandas](https://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-across-multiple-columns-in-python-pandas) — Nick, May 02 '21 at 01:00
@Nick keep=false not helping still giving me full dataframe after pd.concat() — Маруф Башарходжаев, May 02 '21 at 01:13
`False` is spelt with a capital letter. `df1.drop_duplicates(subset='id', keep=False)` — Nick, May 02 '21 at 01:47

score 2 · Answer 1 · answered May 02 '21 at 01:15

2

keep=False should do the trick.
P.S It must be passed as bool and not string.

pd.concat([df, ds]).drop_duplicates(subset='id', keep=False, ignore_index=True)

Output:

    Price   Time(s) id
0   1200    500     a05

answered May 02 '21 at 01:15

Utsav

5,572
2
29
43

jedi · Answer 2 · 2021-05-02T01:20:55.967

This will do it:

ds.merge(df, on='id', how='outer', indicator=True, suffixes=('_xxx',''))
  .query('_merge!="both"')[df.columns]

The merge simulates SQL full outer join. indicator=True adds _merge column that shows left_only, right_only or both, in this case anything that is not both is solution, but in general you can be more specific with subsequent query filter.

Other suggestions of using pd.concat will work in some cases, but they are not generic enough in my opinion. Those solution could include unique records from both dataframes, but in this case you asked from the other one.

score 1 · Answer 3 · answered May 02 '21 at 01:29

1

df1 = pd.merge(df, ds, how='outer',on="id", indicator='common')

df1 = df1.loc[df1['common'] != 'both']

df1

answered May 02 '21 at 01:29

nitin nandeshwar

11
1

score 1 · Accepted Answer · answered May 02 '21 at 03:06

Vectorized isin via numpy.where is also an option:

import pandas as pd
import numpy as np

ds = pd.DataFrame({'Price': {0: 1200, 1: 1400, 2: 1500,
                             3: 1800},
                   'Time(s)': {0: 500, 1: 500, 2: 600,
                               3: 500},
                   'id': {0: 'a01', 1: 'a02', 2: 'a03',
                          3: 'a04'}})

df = pd.DataFrame({'Price': {0: 1200, 1: 1500, 2: 1450,
                             3: 1800, 4: 1200},
                   'Time(s)': {0: 500, 1: 500, 2: 500,
                               3: 500, 4: 500},
                   'id': {0: 'a01', 1: 'a02', 2: 'a03',
                          3: 'a04', 4: 'a05 '}})

out = df.loc[np.where(df['id'].isin(ds['id']), False, True)]

# For Display
print(out.to_string(index=False))

Price  Time(s)   id
 1200      500  a05

Remove rows if value exists in other df

4 Answers4