Short
My python program occupies much more memory than expected or returned by memory profiling tools. I need a strategy to find the memory leak and fix it.
Detailed
I am running a python3 script on a 64bit Linux machine. Almost all code is bundled within one object:
obj = MyObject(*myArguments)
result = obj.doSomething()
print(result)
During the creation of obj
, the program reads a text file with a size of ca. 100MB. Since I save the information in multiple ways, I expect the whole object to occupy a couple of hundret MB memory.
Indeed, measuring its size with asizeof.asized(obj)
from the package pympler returns around 123MB. However, top
tells me that my program occupies around 1GB memory.
I understand that local variables in methods will occupy further RAM. However, looking into my code I see that none of these local variables can be that big. I double checked this using asizeof.asized
again.
It is not a major concern to me that the script needs 1GB of memory. However, I execute some methods in parallel (in 12 pocesses):
class MyObject()
def doSomething(arg):
# do something
def myParallelMethod(args)
with sharedmem.MapReduce() as pool:
result = pool.map(self.doSomething, args)
return result
This makes the the total memory usage become 8GB, even though I put all large objects in shared memory:
self.myLargeNumPyArray = sharedmem.copy(self.myLargeNumPyArray)
I assured with test programs that the memory is really shared.
Checking with asizeof
, I obtained in each subprocess that
asizeof.asized(self)
is 1MB (i.e. much smaller than the "original" object - maybe due to the shared memory, which is not counted double)asizeof.asized(myOneAndOnlyBigLocalVariable)
is 230MB.
All in all my program should occupy not much more than 123MB + 12*230MB = 2.8GB << 8GB. So, why does the program require so much memory?
One explanation could be that there are some hidden parts (garbage?) in my object that is being copied, when the program is run in parallel.
Does anyone know a strategy to find out, where the memory leak is? How could I fix it?
I have read many threads regarding memory profiling, e.g. Profiling memory in python 3, Is there any working memory profiler for Python3, Which Python memory profiler is recommended?, or How do I profile memory usage in Python?, but all the recomended tools do not explain the memory usage.
Update
I have been asked to provide a minimal example of the code. The code below shows the same problems with memory consumption in the parallel part as my original one. I have already figured out the issue with the non-parallel part of my code, which was that I had a large numpy array with data type object
as object variable. Due to this data type, the array cannot be put into shared memory and asized
only returns the shallow size. Thanks to @user2357112 for helping me to figure this out!
I would therefore like to concentrate on the issue in the parallel part: Inserting values into queue
in the method singleSourceShortestPaths
(below marked with a comment) changes the memory consumption from around 1.5GB to 10GB. Are there any ideas for how to explain this behaviour?
import numpy as np
from heapdict import heapdict
from pympler import asizeof
import sharedmem
class RoadNetwork():
strType = "|S10"
def __init__(self):
vertexNo = 1000000
self.edges = np.zeros(1500000, dtype = {"names":["ID", "from_to", "from_to_original", "cost", "inspection", "spot"],
'formats':[self.strType, '2int', '2'+self.strType, "double", "3bool", "2int", "2int"]})
self.edges["ID"] = np.arange(self.edges.size)
self.edges["from_to_original"][:vertexNo, 0] = np.arange(vertexNo)
self.edges["from_to_original"][vertexNo:, 0] = np.random.randint(0, vertexNo, self.edges.size-vertexNo)
self.edges["from_to_original"][:,1] = np.random.randint(0, vertexNo, self.edges.size)
vertexIDs = np.unique(self.edges["from_to_original"])
self.vertices = np.zeros(vertexIDs.size, {"names":["ID", "type", "lakeID"],
'formats':[self.strType, 'int', self.strType]})
def singleSourceShortestPaths(self, sourceIndex):
vertexData = np.zeros(self.vertices.size, dtype={"names":["predecessor", "edge", "cost"],
'formats':['int', "2int", "double"]})
queue = np.zeros((self.vertices.size, 2), dtype=np.double)
#Crucual line!! Commetning this decreases memory usage by 7GB in the parallel part
queue[:,0] = np.arange(self.vertices.size)
queue = heapdict(queue)
print("self in singleSourceShortestPaths", asizeof.asized(self))
print("queue in singleSourceShortestPaths", asizeof.asized(queue))
print("vertexData in singleSourceShortestPaths", asizeof.asized(vertexData))
# do stuff (in my real program Dijkstra's algorithm would follow)
# I inserted this lines as an ugly version for 'wait()' to
# give me enough time to measure the memory consumption in 'top'
for i in range(10000000000):
pass
return vertexData
def determineFlowInformation(self):
print("self in determineFlowInformation", asizeof.asized(self))
f = lambda i: self.singleSourceShortestPaths(i)
self.parmap(f, range(30))
def parmap(self, f, argList):
"""
Executes f(arg) for arg in argList in parallel
returns a list of the results in the same order as the
arguments, invalid results (None) are ignored
"""
self.__make_np_arrays_sharable()
with sharedmem.MapReduce() as pool:
results, to_do_list = zip(*pool.map(f, argList))
return results
def __make_np_arrays_sharable(self):
"""
Replaces all numpy array object variables,
which should have the same
behaviour / properties as the numpy array
"""
varDict = self.__dict__
for key, var in varDict.items():
if type(var) is np.ndarray:
varDict[key] = sharedmem.copy(var)
if __name__ == '__main__':
network = RoadNetwork()
print(asizeof.asized(network, detail=1))
for key, var in network.__dict__.items():
print(key, asizeof.asized(var))
network.determineFlowInformation()