I am using a solution in this thread to get the difference between two data frames:
df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})
df2=pd.DataFrame({'A':[1],'B':[2]})
dfx = df1[~df1.apply(tuple, 1).isin(df2.apply(tuple, 1))]
Data frame df1
is about 1Gb and df2
is about 100Mb. I get a "MemoryError" around the function apply
.
Is this normal and what is the workaround, if any?