0

I would like to left join df1 and df2 and get records from df1(left table only). I am trying with below code but I get memory error.

df1.merge(df2, indicator='i', how='outer',on=['col1']).query('i == "left_only"').drop('i', 1)
df1:
ID
QA00797310082
IS00797320000
QA12567318888
WS00565610099
MA10897310022

df2:
ID
QA00797310082
IS00797320000
QA12567318888
WS00565610099
MA10897310022

I am trying to join on ID column and both the dataframes have just one column each. Error:

Unable to allocate 2.82 GiB for an array with shape (1, 379038888) and data type object

I tried with removing all extra columns but I still end up with this error.

Is there any other way of getting all the records from left table only ?

unicorn
  • 496
  • 6
  • 20
  • Is it really only 72k+? The error message shows trying to allocate shape (1, 379038888) which doesn't seem to be added up by just 72k. – Emma Aug 02 '21 at 20:51
  • Yes .. it is just 72k .. it is giving this memory error only while doing left join no other warning as well. Is there a way I can increase the size ? – unicorn Aug 03 '21 at 04:02
  • Even in the worst case scenario where all IDs are different, 72k*2 is way less than 379m. You could optimistically reduce the joined size by making `how='left'` but still 379m is oddly high and doubt there is something else going on. – Emma Aug 03 '21 at 16:57

0 Answers0