Python memory leaks

Question

I have a long-running script which, if let to run long enough, will consume all the memory on my system.

Without going into details about the script, I have two questions:

Are there any "Best Practices" to follow, which will help prevent leaks from occurring?
What techniques are there to debug memory leaks in Python?

I have found [this recipe](http://code.activestate.com/recipes/65333/) helpful. — David Schein, Sep 17 '09 at 00:30
@Casebash: If that function prints anything you're seriously doing it wrong. It lists objects with `__del__` method that are no longer referenced except for their cycle. The cycle cannot be broken, because of issues with `__del__`. Fix it! — Helmut Grohne, Nov 02 '10 at 14:57
Possible duplicate of [How do I profile memory usage in Python?](https://stackoverflow.com/questions/552744/how-do-i-profile-memory-usage-in-python) — Don Kirkby, Aug 14 '17 at 16:33

score 118 · Accepted Answer · edited Feb 05 '21 at 20:07

118

Have a look at this article: Tracing python memory leaks

Also, note that the garbage collection module actually can have debug flags set. Look at the set_debug function. Additionally, look at this code by Gnibbler for determining the types of objects that have been created after a call.

edited Feb 05 '21 at 20:07

feuGene

3,931
2
33
46

answered Sep 16 '09 at 20:58

ChristopheD

112,638
29
165
179

1

[objgraph](https://objgraph.readthedocs.io/en/stable/) is just what I was looking for! thank you! – Piotr Czapla Dec 09 '22 at 09:35
The article link is dead. – Adomas Baliuka Feb 11 '23 at 01:22

linqu · Answer 2 · 2016-05-20T07:38:37.100

104

I tried out most options mentioned previously but found this small and intuitive package to be the best: pympler

It's quite straight forward to trace objects that were not garbage-collected, check this small example:

install package via pip install pympler

from pympler.tracker import SummaryTracker
tracker = SummaryTracker()

# ... some code you want to investigate ...

tracker.print_diff()

The output shows you all the objects that have been added, plus the memory they consumed.

Sample output:

                                 types |   # objects |   total size
====================================== | =========== | ============
                                  list |        1095 |    160.78 KB
                                   str |        1093 |     66.33 KB
                                   int |         120 |      2.81 KB
                                  dict |           3 |       840 B
      frame (codename: create_summary) |           1 |       560 B
          frame (codename: print_diff) |           1 |       480 B

This package provides a number of more features. Check pympler's documentation, in particular the section Identifying memory leaks.

edited May 20 '16 at 07:38

answered Apr 28 '15 at 15:27

linqu

11,320
8
55
67

8

It's worth noting that `pympler` can be **SLOW**. If you're doing something semi-realtime, it can completely cripple your application performance. – Fake Name Feb 20 '17 at 22:41
@sebpiq strangely, the same happens to me... do you have any idea *why* this is happening? A quick look at the source code gave no real insights. – linusg Mar 31 '18 at 19:05
When you say slow you mean slow in the code in between, or only when running print_diff ? – Nathan B Nov 27 '22 at 20:50
Great tool! Found a sneaky memory leak in a library using pympler (=> https://github.com/python-babel/babel/issues/962) – Edward Gaere Jan 28 '23 at 05:05

Denis Ryzhkov · Answer 3 · 2020-04-03T06:45:56.703

33

Let me recommend mem_top tool I created

It helped me to solve a similar issue

It just instantly shows top suspects for memory leaks in a Python program

edited Apr 03 '20 at 06:45

answered May 27 '14 at 07:32

Denis Ryzhkov

2,321
19
12

2

that's true... but it gives very little in the way of usage/results explanation – me_ Feb 03 '18 at 18:59
@me_ , this tool has both "Usage" and "Explaining result" sections documented. Should I add explanation like "refs is count of references from the object, types is count of objects of this type, bytes is size of the object" - wouldn't it be too obvious to document this? – Denis Ryzhkov Feb 05 '18 at 08:43
the tool's usage docs give a single line saying "from time to time: logging.debug(mem_top())", while its explanation of results is the author's real life error tracking experience without context... that's not a technical specification that tells a dev exactly what they are looking at... I'm not knocking your answer... it shows high level suspects as billed... it doesn't give adequate documentation to fully comprehend the result of use... for example, in the "Explaining Results" output why is the "GearmanJobRequest" obviously a problem? no explanation for why... – me_ Feb 05 '18 at 16:27
1

i guess i'm inadvertently knocking your tool, you are the author... no offense was intended... – me_ Feb 05 '18 at 16:31
7

@me_ , I've just added the next step to "Usage", added "Counters" section, added explanation why exactly Gearman was a suspect in that real life example, documented each optional parameter of "mem_top()" in the code, and uploaded this all as v0.1.7 - please take a look if anything else could be improved. Thank you! ) – Denis Ryzhkov Feb 07 '18 at 09:18

user1527491 · Answer 4 · 2017-08-16T19:40:05.030

Tracemalloc module was integrated as a built-in module starting from Python 3.4, and appearently, it's also available for prior versions of Python as a third-party library (haven't tested it though).

This module is able to output the precise files and lines that allocated the most memory. IMHO, this information is infinitly more valuable than the number of allocated instances for each type (which ends up being a lot of tuples 99% of the time, which is a clue, but barely helps in most cases).

I recommend you use tracemalloc in combination with pyrasite. 9 times out of 10, running the top 10 snippet in a pyrasite-shell will give you enough information and hints to to fix the leak within 10 minutes. Yet, if you're still unable to find the leak cause, pyrasite-shell in combination with the other tools mentioned in this thread will probably give you some more hints too. You should also take a look on all the extra helpers provided by pyrasite (such as the memory viewer).

https://pytracemalloc.readthedocs.io/ does not exist any more — Dimitrios Mistriotis, Jul 03 '20 at 11:06

score 13 · Answer 5 · edited Apr 08 '14 at 06:03

You should specially have a look on your global or static data (long living data).

When this data grows without restriction, you can also get troubles in Python.

The garbage collector can only collect data, that is not referenced any more. But your static data can hookup data elements that should be freed.

Another problem can be memory cycles, but at least in theory the Garbage collector should find and eliminate cycles -- at least as long as they are not hooked on some long living data.

What kinds of long living data are specially troublesome? Have a good look on any lists and dictionaries -- they can grow without any limit. In dictionaries you might even don't see the trouble coming since when you access dicts, the number of keys in the dictionary might not be of big visibility to you ...

score 9 · Answer 6 · answered Jul 01 '17 at 08:05

9

To detect and locate memory leaks for long running processes, e.g. in production environments, you can now use stackimpact. It uses tracemalloc underneath. More info in this post.

answered Jul 01 '17 at 08:05

logix

516
7
10

It seems like this project is dead. – DarkMath Apr 17 '23 at 08:56

score 7 · Answer 7 · answered Mar 03 '17 at 21:55

As far as best practices, keep an eye for recursive functions. In my case I ran into issues with recursion (where there didn't need to be). A simplified example of what I was doing:

def my_function():
    # lots of memory intensive operations
    # like operating on images or huge dictionaries and lists
    .....
    my_flag = True
    if my_flag:  # restart the function if a certain flag is true
        my_function()

def main():
    my_function()

operating in this recursive manner won't trigger the garbage collection and clear out the remains of the function, so every time through memory usage is growing and growing.

My solution was to pull the recursive call out of my_function() and have main() handle when to call it again. this way the function ends naturally and cleans up after itself.

def my_function():
    # lots of memory intensive operations
    # like operating on images or huge dictionaries and lists
    .....
    my_flag = True
    .....
    return my_flag

def main():
    result = my_function()
    if result:
        my_function()

Using recursion in this manner will also break if you hit the recursion depth limit because Python doesn't optimize tail calls. By default, this is 1000 recursive calls. — Lie Ryan, Mar 26 '17 at 02:52

score 3 · Answer 8 · answered Sep 16 '09 at 20:59

3

Not sure about "Best Practices" for memory leaks in python, but python should clear it's own memory by it's garbage collector. So mainly I would start by checking for circular list of some short, since they won't be picked up by the garbage collector.

answered Sep 16 '09 at 20:59

martiert

1,636
2
18
23

3

or references to objects that are being kept forever, etc – matt b Sep 16 '09 at 21:04
10

Can you guys please provide examples of circular lists and objects that are being kept forever? – Daniel Oct 02 '17 at 00:54

score 2 · Answer 9 · answered Feb 19 '12 at 20:32

2

This is by no means exhaustive advice. But number one thing to keep in mind when writing with the thought of avoiding future memory leaks (loops) is to make sure that anything which accepts a reference to a call-back, should store that call-back as a weak reference.

answered Feb 19 '12 at 20:32

Dmitry Rubanovich

2,471
19
27

And how would you do that in python? – LtWorf Oct 18 '20 at 08:12
Why should we store a callback as a weak reference? What is special about callbacks? – greatvovan Apr 12 '21 at 16:27

Python memory leaks

9 Answers9

Linked