0

How to get the difference between two DataFrames. For example, I have 2 DataFrames

previous_asks =  pd.DataFrame({'price':[1,2,3], 'amount':[10,20,30]})
current_asks = pd.DataFrame({'price':[1,2,3,4], 'amount':[11,20,30,40]})

I would like to receive

price':[1, 4], 'amount':[11,40]
user45245
  • 845
  • 1
  • 8
  • 18

1 Answers1

0

Using pandas:

a1 = list(range(10))
a2 = list(range(5, 8))

b1 = list('abcdefghij')
b2 = list('efy')

df1 =  pd.DataFrame({'price':a1, 'amount':b1})
df2 = pd.DataFrame({'price':a2, 'amount':b2})

dict_results = dict()
for col in df1:
    dict_results[col] = df1.loc[~ df1[col].isin(df2[col].values), col].values
    print('--', col, dict_results[col])

Gives:

-- amount ['a' 'b' 'c' 'd' 'g' 'h' 'i' 'j']
-- price [0 1 2 3 4 8 9]

Using python3:

set1 = set(a1)
set2 = set(a2)
print(set1 - set2)

Gives:

{0, 1, 2, 3, 4, 8, 9}

I would rather go with python3 here since I think it is much simpler/readable. If you orignally have pandas dataframes I would convert those to set data type, manipulate those and revert back to pd.Dataframe if necessary.

Also it is worth checking out the unique() method of a pd.Series.

fmv1992
  • 322
  • 1
  • 4
  • 14