0

I am using a solution in this thread to get the difference between two data frames:

df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})
df2=pd.DataFrame({'A':[1],'B':[2]})

dfx = df1[~df1.apply(tuple, 1).isin(df2.apply(tuple, 1))]

Data frame df1 is about 1Gb and df2 is about 100Mb. I get a "MemoryError" around the function apply.

Is this normal and what is the workaround, if any?

Tristan Tran
  • 1,351
  • 1
  • 10
  • 36
  • Why not try the `merge` method in the accepted answer? `dfx = df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both']` – not_speshal Dec 17 '21 at 21:12
  • This solution gives similar error ```numpy.core._exceptions.MemoryError: Unable to allocate 735. MiB for an array with shape (96378858,) and data type int64``` – Tristan Tran Dec 17 '21 at 21:27

0 Answers0