I have two large datasets:
training size: 289816 rows X 689 columns
testing size: 49863 rows X 689 columns
I want to drop some rows of testing set as they already exist in the training.
I checked the following answer https://stackoverflow.com/a/44706892
but unfortunately python processes are killed as 144 gigabit of memory are filled.
Are there any better solution that is not resource consuming?