-1

I'm new in Python. Let's say, I use large pandas data frames. My code looks something like:

all_data = pd.read_csv(huge_file_name)
part_data = all_data.loc['ColumnName1', 'ColumnName2','ColumnName3']
data_filtered = part_data.loc[:,part_data['ColumnName2']==-1]

and so on. Is some way, that python can delete all_data, part_data and other variables no more used? I can write del var_name, but it will change the code to be very dirty. Also I can use for all variables the same name, but it also doesn't look good. Thank you all in advance!

  • Does this answer your question? [How can I explicitly free memory in Python?](https://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python) – AMC Mar 19 '20 at 15:02
  • What is `all_data.loc['ColumnName1', 'ColumnName2','ColumnName3']` for? – AMC Mar 19 '20 at 15:03

1 Answers1

1

The del keyword is the way to do it; I'm not sure there's much to be done about your concern for making the code "dirty." Python people like to say that explicit is better than implicit, and this would be an instance of that.

Otherwise declare the intermediate variables within a function scope and the space used by those variables will be freed (or rather marked for "garbage collection"; see below) when the function terminates.

So you could:

import gc

all_data = pd.read_csv(huge_file_name)
part_data = all_data.loc['ColumnName1', 'ColumnName2','ColumnName3']
data_filtered = part_data.loc[:,part_data['ColumnName2']==-1]

del all_data, part_data

# and if you're impatient for that memory to be freed, like RIGHT now
gc.collect()

Or you could:

import gc

def filter_data(infile):
    all_data = pd.read_csv(infile)
    part_data = all_data.loc['ColumnName1', 'ColumnName2','ColumnName3']
    return part_data.loc[:,part_data['ColumnName2']==-1]

data_filtered = filter_data(huge_file_name)

# force out-of-scope variables to be garbage collected RIGHT now
gc.collect()

The del keyword releases a variable from the local scope so it can be (eventually) garbage collected, but the memory freed when variables go out of scope may not be immediately returned to the operating system. The SO thread AMC helpfully pointed you to has details.

Garbage collection strategies are PhD-level computer science stuff, but my intuition is that GC is only triggered when there is some "pressure" on the Python runtime to release some memory; as in, new variable declarations that would need to use some memory previously in use by out-of-scope variables.

You were careful to point out that this is a large CSV file being read into a single (Pandas) data structure, but be mindful of the fact that out-of-scope variables are normally automatically garbage collected, and usually you do not need to micro-manage this process yourself.

Here is some background on garbage collection in Python that you may find illuminating, and here is a discussion of other times when del is useful (deleting slices out of a list, for example).

TheDudeAbides
  • 1,821
  • 1
  • 21
  • 29