5

Short

My python program occupies much more memory than expected or returned by memory profiling tools. I need a strategy to find the memory leak and fix it.

Detailed

I am running a python3 script on a 64bit Linux machine. Almost all code is bundled within one object:

obj = MyObject(*myArguments)
result = obj.doSomething()
print(result)

During the creation of obj, the program reads a text file with a size of ca. 100MB. Since I save the information in multiple ways, I expect the whole object to occupy a couple of hundret MB memory.

Indeed, measuring its size with asizeof.asized(obj) from the package pympler returns around 123MB. However, top tells me that my program occupies around 1GB memory.

I understand that local variables in methods will occupy further RAM. However, looking into my code I see that none of these local variables can be that big. I double checked this using asizeof.asized again.

It is not a major concern to me that the script needs 1GB of memory. However, I execute some methods in parallel (in 12 pocesses):

class MyObject()

    def doSomething(arg):
        # do something

    def myParallelMethod(args)
        with sharedmem.MapReduce() as pool:
            result = pool.map(self.doSomething, args)
        return result

This makes the the total memory usage become 8GB, even though I put all large objects in shared memory:

self.myLargeNumPyArray = sharedmem.copy(self.myLargeNumPyArray)

I assured with test programs that the memory is really shared.

Checking with asizeof, I obtained in each subprocess that

  • asizeof.asized(self) is 1MB (i.e. much smaller than the "original" object - maybe due to the shared memory, which is not counted double)
  • asizeof.asized(myOneAndOnlyBigLocalVariable) is 230MB.

All in all my program should occupy not much more than 123MB + 12*230MB = 2.8GB << 8GB. So, why does the program require so much memory?

One explanation could be that there are some hidden parts (garbage?) in my object that is being copied, when the program is run in parallel.

Does anyone know a strategy to find out, where the memory leak is? How could I fix it?

I have read many threads regarding memory profiling, e.g. Profiling memory in python 3, Is there any working memory profiler for Python3, Which Python memory profiler is recommended?, or How do I profile memory usage in Python?, but all the recomended tools do not explain the memory usage.


Update

I have been asked to provide a minimal example of the code. The code below shows the same problems with memory consumption in the parallel part as my original one. I have already figured out the issue with the non-parallel part of my code, which was that I had a large numpy array with data type object as object variable. Due to this data type, the array cannot be put into shared memory and asized only returns the shallow size. Thanks to @user2357112 for helping me to figure this out!

I would therefore like to concentrate on the issue in the parallel part: Inserting values into queue in the method singleSourceShortestPaths (below marked with a comment) changes the memory consumption from around 1.5GB to 10GB. Are there any ideas for how to explain this behaviour?

import numpy as np
from heapdict import heapdict
from pympler import asizeof
import sharedmem

class RoadNetwork():

    strType = "|S10"

    def __init__(self):

        vertexNo = 1000000
        self.edges = np.zeros(1500000, dtype = {"names":["ID", "from_to", "from_to_original", "cost", "inspection", "spot"], 
                                                   'formats':[self.strType, '2int', '2'+self.strType, "double", "3bool", "2int", "2int"]})
        self.edges["ID"] = np.arange(self.edges.size)
        self.edges["from_to_original"][:vertexNo, 0] = np.arange(vertexNo)
        self.edges["from_to_original"][vertexNo:, 0] = np.random.randint(0, vertexNo, self.edges.size-vertexNo)
        self.edges["from_to_original"][:,1] = np.random.randint(0, vertexNo, self.edges.size)

        vertexIDs = np.unique(self.edges["from_to_original"])
        self.vertices = np.zeros(vertexIDs.size, {"names":["ID", "type", "lakeID"], 
                                                  'formats':[self.strType, 'int', self.strType]})

    def singleSourceShortestPaths(self, sourceIndex):

        vertexData = np.zeros(self.vertices.size, dtype={"names":["predecessor", "edge", "cost"], 
                                         'formats':['int', "2int", "double"]})

        queue = np.zeros((self.vertices.size, 2), dtype=np.double)

        #Crucual line!! Commetning this decreases memory usage by 7GB in the parallel part
        queue[:,0] = np.arange(self.vertices.size)

        queue = heapdict(queue)

        print("self in singleSourceShortestPaths", asizeof.asized(self))
        print("queue in singleSourceShortestPaths", asizeof.asized(queue))
        print("vertexData in singleSourceShortestPaths", asizeof.asized(vertexData))

        # do stuff (in my real program Dijkstra's algorithm would follow)
        # I inserted this lines as an ugly version for 'wait()' to
        # give me enough time to measure the memory consumption in 'top'
        for i in range(10000000000):
            pass

        return vertexData

    def determineFlowInformation(self):
        print("self in determineFlowInformation", asizeof.asized(self))
        f = lambda i: self.singleSourceShortestPaths(i)
        self.parmap(f, range(30))

    def parmap(self, f, argList):
        """
        Executes f(arg) for arg in argList in parallel
        returns a list of the results in the same order as the 
        arguments, invalid results (None) are ignored
        """
        self.__make_np_arrays_sharable()
        with sharedmem.MapReduce() as pool:
            results, to_do_list = zip(*pool.map(f, argList))
        return results

    def __make_np_arrays_sharable(self):
        """
        Replaces all numpy array object variables, 
        which should have the same  
        behaviour / properties as the numpy array
        """
        varDict = self.__dict__
        for key, var in varDict.items():
            if type(var) is np.ndarray:
                varDict[key] = sharedmem.copy(var)

if __name__ == '__main__':
    network = RoadNetwork()

    print(asizeof.asized(network, detail=1))
    for key, var in network.__dict__.items():
        print(key, asizeof.asized(var))

    network.determineFlowInformation()
Community
  • 1
  • 1
Samufi
  • 2,465
  • 3
  • 19
  • 43
  • Does `top` display memory in bytes or bits? – Rushy Panchal Apr 30 '16 at 04:01
  • 1
    Start simplifying your code and tearing out any functionality that doesn't make the problem go away when you tear it out, until the result is something compact enough that you can reasonably post it and expect us to read the whole thing, then post it. We can't usually debug code we can't see. – user2357112 Apr 30 '16 at 04:02
  • Can you do binary search debugging? cut off parts of your program till it's not 1Gb any more, then see which cut was the key? – Amadan Apr 30 '16 at 04:03
  • 1
    @RushyPanchal: `top` displays the total memory usage in KB. The memory used by a single process is given in %. However, this number is not reliable, since shared memory may be counted multiple times. I determined the rough memory usage of my program by subtracting the usage before and while execution. – Samufi Apr 30 '16 at 04:08
  • @user2357112: I provided a minimal example now. – Samufi May 01 '16 at 03:10
  • @Amadan: I identified two lines that seem to be crucial for memory consumption (see my edited post). However, I still do not know, **why** they cause the problems. Any ideas? – Samufi May 01 '16 at 03:12
  • 1
    A few things I'm seeing: first, `self.connections` has object dtype, but `self.__make_np_arrays_sharable` tries to put it into shared memory. I don't know what exactly happens when you do that, but it seems guaranteed to be messy, since the contained objects can't be shared. – user2357112 May 01 '16 at 06:30
  • 1
    Second, `asizeof` doesn't handle extension types well. It doesn't know how to follow all the references from the `self.connections` array to the array's contents, so you only get the shallow size of the array. (Also, that's an array, not a list.) – user2357112 May 01 '16 at 07:30
  • Third, it seems kind of weird that you replaced the Dijkstra bit with a 10 billion iteration loop, as opposed to `pass` or just the comment above the loop. That sounds like it would have slowed down testing a lot, and I'm not sure what sort of benefit it might have provided. (You *did* run the code as posted, 10-billion-iteration loop and all, right?) – user2357112 May 01 '16 at 07:43
  • Thanks @user2357112! I understand the issue with the array in `createListView` and edited my post correspondingly. I will find a clean solution to put it into shared memory properly. I was not aware that `asizeof` only returns the shallow size for this array. Do you have any idea for the remaining issue, too? – Samufi May 01 '16 at 17:05
  • To answer your third question @user2357112: I needed the program not to terminate in order to giving me time to measure the memory consumption with top. I admit that the loop is a very ugly solution for that. – Samufi May 01 '16 at 17:06

0 Answers0