0

I have a script that takes about 10 minutes to run on my local machine. At the beginning of the script, I have to build some pretty big lists and iterate through them to perform various cleanup functions, which creates new lists of the cleaned data. Is there a standard way to clear/preserve memory while doing this?

My initial thought was to simply reset the variable to an empty list once I am done using it:

lst = [1,2,3, toinfinity..]
clean_lst = [x for x in l if x < infinity]
lst = []
cleaner_lst = [x for x in clean_lst if x > 100]
clean_lst = []
# etc...
Benjamin James
  • 941
  • 1
  • 9
  • 24
  • 2
    sure ... in general python garbage collection just works – Joran Beasley Dec 29 '16 at 19:26
  • 1
    You can also use `del`. http://stackoverflow.com/questions/6146963/when-is-del-useful-in-python – NPE Dec 29 '16 at 19:27
  • 1
    Why do you need to perform your own run-time memory management? In Python, as in most languages, once you release the memory, the run-time system will return it to the available heap. Releasing it in Python is simply not having any more references to the data object. – Prune Dec 29 '16 at 19:33
  • @Prune thanks, makes sense. I think I'm getting a memory leak elsewhere in the script that is causing the issue. – Benjamin James Dec 29 '16 at 19:35
  • Can you place a memory analyser on this? I can't suggest one; I'm using in-house tools when I work in Python. – Prune Dec 29 '16 at 19:38

2 Answers2

1

Without a larger code context to know for sure, it's difficult to know if this answer is useful, but assuming tou don't need to retain the original list:

Simply replace the lists themselves with the cleaned version:

lst = [1,2,3, toinfinity..]
lst = [x for x in lst if x < infinity]
lst = [x for x in lst if x > 100]

In general, Python handles garbage collection quite well. In this instance, and again assuming you don't need the originals, creating the transient lists is just using memory there's no need to use.

Chris Larson
  • 1,684
  • 1
  • 11
  • 19
1

Firstly, if I were You, I would profile my code, for example by encapsulating those calls You mentioned in function, and calling

import cProfile
cProfile.run("my_function()") 

later, I were to focus on those small optimizations like You requested in post (instead of focusing on REAL performance hogs!), I'd replace code above with:

some_list = [x for x in range(infinity) if x > 100 and x < infinity]
# replacing range with xrange in python2

but this would be just a step before I'd think about converting those square braces into normal ones, converting my list into generator. If You could be a little more precise with requirements, perhaps we could do even better than that.

For example, You mentioned that You run various functions. Why not replace it with something like:

def clean_some_data(data, cleanup_functions):
    return [cleanup_function(data) for cleanup_function in cleanup_functions]

# or

def clean_some_data(data, cleanup_functions):
    for cleanup_function in cleanup_function:
        data = [cleanup_function(item) for item in data]
    return data

there are some possibilities, but more details would be required

JustMe
  • 710
  • 4
  • 16