I have two very large csv files. They are both only one col with integers. I need to check for every integer in dfA if they are in dfB. If so, I need to remove item a from dfA.
I would probably loop through dfA and check for every value if in dfB, but looping is wayyyy too slow.
dfA :
0
0 9312969810
1 3045897298
2 8162414592
3 2030000000
4 7876904982
dfB:
0
0 2030000000
1 2030156119
2 2030389149
3 2030641047
4 2030693850
output:
0
0 2030156119
1 2030389149
2 2030641047
3 2030693850
Since 2030000000 is in dfB, we need to remove from dfA.
Does anyone have a better way. Thanks
edit: csv for dfB is 2gb and dfA is 5mb