Memory leak in pandas when dropping dataframe column?

Question

I have some code like the following

df = ..... # load a very large dataframe
good_columns = set(['a','b',........]) # set of "good" columns we want to keep
columns = list(df.columns.values)
for col in columns:
   if col not in good_columns:
      df = df.drop(col, 1)

The odd thing is that it successfully drops the first column that is not good - so it isn't an issue where I am holding the old and new dataframe in memory at the same time and running out of space. It breaks on the second column being dropped (MemoryError). This makes me suspect there is some kind of memory leak. How would I prevent this error from happening?

I am going to try to use del instead anyways, but I was curious why this was happening. — Andrew, Mar 07 '15 at 00:39

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

1

It may be that your constantly returning a new and very large dataframe. Try setting the drop inplace parameter to True.

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 07 '15 at 00:40

kennes

2,065
17
20

Yeah that fixes it. Still curious why it doesn't break the first time though, instead of the second time - you would think that if it was solely due to that it would break then. – Andrew Mar 07 '15 at 01:14
I see your point. I'm not sure how memory is handled when python programs are executed.More specifically, is the memory stored in an expensive variable immediately available once you reassign that variable? – kennes Mar 07 '15 at 01:30

score 1 · Answer 2 · answered Mar 07 '15 at 04:55

1

Make use of usecols argument while reading the large data frame to keep the columns you want instead of dropping them later on. Check here : http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html

answered Mar 07 '15 at 04:55

Mostafa Mahmoud

570
5
13

score 0 · Answer 3 · edited Dec 02 '17 at 06:07

0

I tried the inplace=True argument but still had the same issues. Here's another solution dealing with the memory leak due to your architecture. That helped me when I had this same issue

edited Dec 02 '17 at 06:07

Marcello B.

4,177
11
45
65

answered Dec 02 '17 at 05:17

Wish I Knew this stuff

167
1
3
12

Memory leak in pandas when dropping dataframe column?

3 Answers3

Linked