0

I am facing a memory leak, it is happening in a thread. My code structure looks like this

while True:
    with self._lock:
        if self.valid_data: 
            # do something

Memory leak

How to delete a list completely.

My Solution
self.__data.clear()
self.__data = []

Problem

How to clear the memory very efficiently in python for list and deque data structures? Here, 3 consumer threads use the data and one reset thread clears the thread based on some flag. del self.__data or any other way?

It happens very rarely, which makes me hard to reproduce, so I am just thinking the best code will resolve the issue. Please advise.

Other approaches

martineau
  • 119,623
  • 25
  • 170
  • 301
ajayramesh
  • 3,576
  • 8
  • 50
  • 75

1 Answers1

1

Clearing, reassigning and deleting are all about the same. They will spend the vast majority of their time removing references and hopefully deleting their contained objects.

That is, assuming there are no other references to these objects. If there is another reference to the list or deque, assignment and deleting don't do anything besides remove one of the reference counts. You'd have to do del mylist[:] or mydeque.clear() in that case. If the contained objects are referenced somewhere else, they won't really be deleted either.

Although objects are usually deleted right away, if they have circular references (a.b = b and b.a = a) the garbage collector needs to run to collect them. Its automagical in the background, but calling gc.collect() right after a large delete intended to clear memory is reasonable. You could then also check len(gc.garbage) which holds dead objects that will never be collected. Anything over 0 is kinda bad but mostly harmless until the number gets large. Then you have to figure out why your code creates dead objects (hard).

But is dumping the list the right thing to do? What does that mean for the validity of the program output? When you detect a problem, it may be better to raise an exception and run.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • my app does a batch processing, after consumer done its processing it will just sets a flag to object. The reset thread looks for that flag and deletes all. Let me try `del list[;]`. so basically I really dont want any of data at all but I can restart the app. Any other way I can invoke garbage collection like in Java? – ajayramesh Aug 27 '20 at 15:35
  • 1
    Assuming you are using the C version of python, its mostly not necessary. When the object reference count goes to zero, it is deleted immediately. But sometimes object references are ambiguous - maybe two objects reference each other so even though they have no external references, they are still at 1 apiece. You can call After clearing out the list/deque you can call `gc.collect()` to get rid of them. This function returns the number of collections. If its a large number, that suggests you have a lot of circular references in your objects.... and ti may be a good idea to get rid of them. – tdelaney Aug 27 '20 at 16:19
  • 1
    You can also call `gc.get_stats()` to find out what collections are like. And also do `len(gc.garbage)` which is a list of uncollectable stuff. That should be near zero. If its large, you have a problem with a bunch of dead objects that could not be freed. – tdelaney Aug 27 '20 at 16:20
  • let me try those recommendations. Liking `gc` lib, finally :) – ajayramesh Aug 27 '20 at 16:33
  • Yes I have `3` dead objects ... figuring it out why – ajayramesh Aug 27 '20 at 17:22
  • 1
    @ajayramesh - 3 dead with a large dataset is likely okay. Check their types. If they are containers with lots of objects then those objects may still be alive. There are edge conditions where an object's `__del__` isn't called (especially common in C extensions). The type of the objects would give a hint about that. Generally, 3 on a large job is fine. – tdelaney Aug 27 '20 at 17:28