0

I Have Two Dataframes df and ds import pandas as pd

ds = pd.DataFrame({'Price': {0: 1200, 1: 1400, 2: 1500,
                                 3: 1800},
                   'Time(s)': {0: 500, 1: 500, 2: 600,
                                 3: 500},
                   'id': {0: 'a01', 1: 'a02', 2: 'a03',
                          3: 'a04'}})

df = pd.DataFrame({'Price': {0: 1200, 1: 1500, 2: 1450,
                                 3: 1800, 4: 1200},
                   'Time(s)': {0: 500, 1: 500, 2: 500,
                                 3: 500, 4: 500},
                   'id': {0: 'a01', 1: 'a02', 2: 'a03',
                          3: 'a04', 4: 'a05 '}})

ds Output:

   Price        Time(s)     id   
   1200         500         a01 
   1400         500         a02
   1500         600         a03
   1800         500         a04        

df Output:

   Price        Time(s)     id  
   1200         500         a01 
   1500         500         a03
   1450         500         a02
   1800         500         a04
   1200         500         a05        

Desired Output: df1:

   Price        Time(s)     id  
   1200         500         a05         

I want to compare them, anf if there same values in id column in both datframes drop them, and outcome save into new dataframe df1, i tried to:

df1 = pd.concat([ds,df]) and

df1.drop_duplicates(subset='id',keep='last') But The Values were the same

4 Answers4

2

keep=False should do the trick.
P.S It must be passed as bool and not string.

pd.concat([df, ds]).drop_duplicates(subset='id', keep=False, ignore_index=True)

Output:

    Price   Time(s) id
0   1200    500     a05
Utsav
  • 5,572
  • 2
  • 29
  • 43
1

This will do it:

ds.merge(df, on='id', how='outer', indicator=True, suffixes=('_xxx',''))
  .query('_merge!="both"')[df.columns]

enter image description here

The merge simulates SQL full outer join. indicator=True adds _merge column that shows left_only, right_only or both, in this case anything that is not both is solution, but in general you can be more specific with subsequent query filter.

Other suggestions of using pd.concat will work in some cases, but they are not generic enough in my opinion. Those solution could include unique records from both dataframes, but in this case you asked from the other one.

jedi
  • 525
  • 4
  • 11
1

df1 = pd.merge(df, ds, how='outer',on="id", indicator='common')

df1 = df1.loc[df1['common'] != 'both']

df1

1

Vectorized isin via numpy.where is also an option:

import pandas as pd
import numpy as np

ds = pd.DataFrame({'Price': {0: 1200, 1: 1400, 2: 1500,
                             3: 1800},
                   'Time(s)': {0: 500, 1: 500, 2: 600,
                               3: 500},
                   'id': {0: 'a01', 1: 'a02', 2: 'a03',
                          3: 'a04'}})

df = pd.DataFrame({'Price': {0: 1200, 1: 1500, 2: 1450,
                             3: 1800, 4: 1200},
                   'Time(s)': {0: 500, 1: 500, 2: 500,
                               3: 500, 4: 500},
                   'id': {0: 'a01', 1: 'a02', 2: 'a03',
                          3: 'a04', 4: 'a05 '}})

out = df.loc[np.where(df['id'].isin(ds['id']), False, True)]

# For Display
print(out.to_string(index=False))
Price  Time(s)   id
 1200      500  a05 
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57