0

I've recently finished writing a genetic algorithm. The algo creates a list of 50 test cases, and runs that trial through a simulation, and stores the result. I use a number of custom classes, and the stored results contain lists of class instances.

When I run this simulation, beyond taking forever, it runs my Memory Usage up to 90+%. There must be somewhere in my code that is just hogging memory, but I'm not sure how to find it. Here is the main part of my code, with the loops inside loops inside loops.....

        # ----------------------------------------------------------------------------------------------------
    for i in range(1,numberOfRunsToSolve,1): # begin genetic algo loop
        temp = trial_cases
        for ii, stored_trial in enumerate(temp): # run through stored trial cases
            new_trials = []
            for jj in range(1,numberOfTrialsPerRound):
                tc = []
                tc = randomTrials_GenAlgo_ARRAY(stored_trial, True) # create new trial set, based on indexed stored results
                new_trials.append(tc)
            print(new_trials)

            for j, trial in enumerate(new_trials):
                x = OneGenerationSimulation(trial) #returns [ObjArray, ErrorArray]
                rev = revenueAndLoss(x[0])
                DEBUG_ARRAY.append(str(revenue)+' trial: ' + str(trial))
                results.append((revenue, trial, x[0],x[1]))


        results.sort(reverse=True)  # sort to bring best revenue to top
        results = results[0:numberOfResultsToKeepPerRound-1] 

        trial_cases = []
        for i, r in enumerate(results):
            trial_cases.append(r[1])

        # end of the genetic algo loop
    # ----------------------------------------------------------------------------------------------------

Any suggestions how to track memory usage in my script, and hunt down culprits? I'm pretty new to Python, so feel free to state the obvious.

EDIT: The process above is essentially doing this: 1) Create 50 trial runs.
2) Run a simulation on each trial. This simulation creates hundreds of custom objects, and runs a script on them, returning the results. 3) With all the results that come back, retrieve the best 5 outcomes. 4) With those 5 outcomes, create new trial sets, and repeat the process over again.

I'm worried that the mass-object instance creation, which is then filtered down to the best 5 results, isn't being cleaned up properly in memory or something... and that all those objects are hiding in the background....

Thanks - KC.

keynesiancross
  • 3,441
  • 15
  • 47
  • 87
  • first place I'd look is recursion, but if I understand correctly, then you've already figured out that these loops are definitely the problem; is that right? – ScegfOd Feb 28 '17 at 19:25
  • yeah- definitely. The custom object stuff may not be the most efficient either - but there isn't much code besides what you see above (i'm doing this in Juypter fwiw) – keynesiancross Feb 28 '17 at 19:28
  • One thing I'd try is to see if those giant arrays of arrays are causing problems by trying smaller numbers (for `numberOfTrialsPerRound` and `numberOfRunsToSolve`) and seeing if that makes a big impact. I didn't see anything obviously wasteful yet though... – ScegfOd Feb 28 '17 at 19:29
  • 2
    Where does "revenue" get set? Did you mean "rev"? I'm not sure how many results you are getting versus how many you are keeping. However, instead of getting all the results, sorting all the results and then throwing away some. What about just keeping the ones you need? I.e. do an "insertion sort" type thing. Can you get rid of DEBUG_ARRAY and see if it helps? If memory is a concern, you could see if garbage collecting helps. Also, no reason to use `enumerate` in your final loop, just have `trial_cases = [ r[1] for r in results]` – RobertB Feb 28 '17 at 19:34
  • sorry - you're right. Revenue = rev. I was trying to shorten my variables in the above. – keynesiancross Feb 28 '17 at 19:35
  • I like RobertB's insertion sort idea; depending on how big the list is and how much you're throwing away, the sorting could be wasting a lot of time and memory. – ScegfOd Feb 28 '17 at 19:38
  • fair point - I can probably think of another way to do that (there isn't some predefined python function for that is there?) – keynesiancross Feb 28 '17 at 19:43
  • Related question - I have a number of Prints coming out of my custom objects. I've used 'blockPrint()' [from: http://stackoverflow.com/questions/8391411/suppress-calls-to-print-python], but could those be stacking up in the background or something? Do printed out statements take up memory?\ – keynesiancross Feb 28 '17 at 20:18
  • Re: insertion sort. Nothing standard, I will post an answer with some code to get you started. Re: blockPrint... I don't think that will use memory, no. – RobertB Feb 28 '17 at 20:31
  • I tried some of the above suggestions, but it's still not helping the memory problem. I made a edit to try and give more detail.. fwiw – keynesiancross Feb 28 '17 at 20:47
  • Did you try to force garbage collection? Try `import gc` and then put `gc.collect()` inside your loop. That forces a garbage collection. – RobertB Feb 28 '17 at 21:01
  • cool- i'll give it a go. I actually had an error in my first implementation of the sorted inset... I'm rerunning that now, and so far the memory looks way better... so far........ – keynesiancross Feb 28 '17 at 21:05

1 Answers1

1

Here is a quick and dirty insertion sort you could use. Rather than a function, you could inline this code to avoid function call overhead.

def top_insert(my_list, item, keep_only = 10):
    rev = item[0]
    newlist = [ i for i in my_list if i[0] > rev] 
    if len(newlist) >= keep_only:
        return newlist[:keep_only]
    elif len(newlist) == (keep_only - 1):
        return newlist + [item]
    else:
        return newlist + [item] + my_list[len(newlist):keep_only-len(newlist)-1]
RobertB
  • 1,879
  • 10
  • 17
  • Based on the edit, this is probably not the solution after all. Sorting 50 things wouldn't be a huge performance hit. – RobertB Feb 28 '17 at 21:04
  • I think the issue is that each of those items that it's sorting contains a ton of info / uses a ton of memory. See my other question here: http://stackoverflow.com/questions/42517635/python-query-over-list-of-tuples/42518373#42518373 Once I did a sorted insert, it kept the number of items I was storing way down – keynesiancross Feb 28 '17 at 21:40