I have some code like the following
df = ..... # load a very large dataframe
good_columns = set(['a','b',........]) # set of "good" columns we want to keep
columns = list(df.columns.values)
for col in columns:
if col not in good_columns:
df = df.drop(col, 1)
The odd thing is that it successfully drops the first column that is not good - so it isn't an issue where I am holding the old and new dataframe in memory at the same time and running out of space. It breaks on the second column being dropped (MemoryError). This makes me suspect there is some kind of memory leak. How would I prevent this error from happening?