759

I want to know the memory usage of my Python application and specifically want to know what code blocks/portions or objects are consuming most memory. Google search shows a commercial one is Python Memory Validator (Windows only).

And open source ones are PySizer and Heapy.

I haven't tried anyone, so I wanted to know which one is the best considering:

  1. Gives most details.

  2. I have to do least or no changes to my code.

Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
Anurag Uniyal
  • 85,954
  • 40
  • 175
  • 219
  • 3
    For finding the sources of leaks I recommend objgraph. – pi. Nov 15 '12 at 10:23
  • 14
    @MikeiLL There is a place for questions like these: [SoftwareRecs.SE] – Poik Feb 05 '15 at 19:12
  • 6
    This is happening often enough that we should be able to migrate one question to another forum instead. – zabumba Apr 11 '16 at 14:53
  • One tip: If someone use gae to and want's to check memory usage - it's a big headache, because those tools didn't output nothing or event not started. If you want to test something small, move function that you want to test to separate file, and run this file alone. – alexche8 Jul 22 '16 at 11:34
  • 7
    I recommend [pympler](https://pythonhosted.org/Pympler/) – zzzeek Jun 20 '17 at 13:57
  • 2
    Check out [memray](https://github.com/bloomberg/memray) – Levon Apr 27 '22 at 12:36

8 Answers8

494

My module memory_profiler is capable of printing a line-by-line report of memory usage and works on Unix and Windows (needs psutil on this last one). Output is not very detailed but the goal is to give you an overview of where the code is consuming more memory, not an exhaustive analysis on allocated objects.

After decorating your function with @profile and running your code with the -m memory_profiler flag it will print a line-by-line report like this:

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a
Rob Bednark
  • 25,981
  • 23
  • 80
  • 125
Fabian Pedregosa
  • 6,329
  • 1
  • 22
  • 20
  • 1
    For my usecase - a simple image manipulation script, not a complex system, which happened to leave some cursors open - this was the best solution. Very simple to drop in and figure out what's going on, with minimal gunk added to your code. Perfect for quick fixes and probably great for other applications too. – floer32 Apr 08 '13 at 12:01
  • This is great. Is there any way to use it to collect memory usage per object? (as opposed to per line). Ideally from an IPython session with objects already in memory. If not, do you have any pointers on something along these lines? – Amelio Vazquez-Reina Aug 20 '13 at 15:28
  • It doesn't get memory usage of individual objects. For that task, [guppy/heapy](https://pypi.python.org/pypi/guppy) might be what you want. – Fabian Pedregosa Aug 22 '13 at 06:37
  • 28
    I find memory_profiler to be really simple and easy to use. I want to do profiling per line and not per object. Thanks for writing. – tommy.carstensen Sep 08 '13 at 17:27
  • 2
    @FabianPedregosa how dose memory_profiler handle loops, can it identifier loop iteration number? – Glen Fletcher Jun 17 '14 at 08:42
  • 3
    It identifies loops only implicitly when it tries to report the line-by-line amount and it finds duplicated lines. In that case it will just take the max of all iterations. – Fabian Pedregosa Jun 17 '14 at 09:15
  • I have tried to profile memory usage of python application that was using `tensorflow` in cpu mode depending on input image size and `python -m memory_profiler example.py` does not give me correct results, and `mprof` give me results similar to `htop`. – mrgloom Nov 10 '17 at 13:30
  • 1
    Does not seem to perform very well in CPU-intensive programs – jarandaf Nov 21 '17 at 10:51
  • @FabianPedregosa: How to specify the installation path? I want to install it on my another python folder. thx – Lion Lai Nov 23 '17 at 10:27
  • Same way as any other python package, ```pip install --target=/custom/path memory_profiler``` – Fabian Pedregosa Nov 27 '17 at 19:50
  • 2
    I have tried `memory_profiler` but think it is not a good choice. It makes the program execution incredibly slow (approximately in my case as 30 times as slow). – AnnetteC Dec 12 '17 at 12:29
  • There is a constant overhead (per-line) in tracking memory consumption, so if your program is extremely long or has many fast for/while loops, then I would expect this to slow down significantly your program. In that case, the time-based (opposed to line-based) profiler might be better. This is run as ``mprof run – Fabian Pedregosa Dec 13 '17 at 15:55
  • memory profiler and heapy solve 2 different cases i guess, one is concerned with memory consumption per line while the other one goes along objects – PirateApp May 07 '18 at 07:05
  • 1
    @FabianPedregosa Does `memory_profiler` buffer its output? I may be doing something wrong, but it seems that rather than dump the profile for a function when it completes, it waits for the script to end. – Greenstick Jul 30 '18 at 17:51
  • 1
    It does indeed wait until the script finishes. It would not be easy to do otherwise as the function could be called again, in which case memory_profiler will aggregate the results. – Fabian Pedregosa Jul 31 '18 at 18:07
  • @FabianPedregosa Thanks for so useful and simple library! Though I'm being confused with the output - when I run `mprof run test.py` and then `mprof plot` I get different memory usage from line-by-line output vs over-time. Line-by-line I get maximum of 550MiB while from the plot I get maximum of 5000MiB. What can be the problem? Thanks! – sashaostr Sep 22 '19 at 14:39
  • For me memory profiler slowed down execution by roughly a factor 10! Note that I had large objects in the orders of a few GB. Otherwise cool tool. – Felix Mueller Jan 04 '21 at 12:52
  • 1
    This tool is no longer maintained. – SCGH Apr 04 '22 at 21:43
312

guppy3 is quite simple to use. At some point in your code, you have to write the following:

from guppy import hpy
h = hpy()
print(h.heap())

This gives you some output like this:

Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0  35144  27  2140412  26   2140412  26 str
1  38397  29  1309020  16   3449432  42 tuple
2    530   0   739856   9   4189288  50 dict (no owner)

You can also find out from where objects are referenced and get statistics about that, but somehow the docs on that are a bit sparse.

There is a graphical browser as well, written in Tk.

For Python 2.x, use Heapy.

Niko Föhr
  • 28,336
  • 10
  • 93
  • 96
Torsten Marek
  • 83,780
  • 21
  • 91
  • 98
  • 2
    sadly doesn't seem to build or install in osx.. 10.4 at least. – shigeta Aug 28 '11 at 03:06
  • 1
    It builds on OS X 10.7.1 with homebrew, but sadly doesn't run :-( – Edward Grefenstette Sep 12 '11 at 00:53
  • 25
    If you're on Python 2.7 you may need the trunk version of it: http://sourceforge.net/tracker/?func=detail&aid=3047282&group_id=105577&atid=641821, `pip install https://guppy-pe.svn.sourceforge.net/svnroot/guppy-pe/trunk/guppy` – James Snyder Jan 03 '12 at 20:06
  • Latest version (0.1.9) builds on Windows for Python 2.6 x64 but `h.heap()` call causes APPCRASH. – utapyngo Jan 30 '12 at 05:41
  • 28
    The heapy docs are... not good. But I found this blog post very helpful for getting started: http://www.smira.ru/wp-content/uploads/2011/08/heapy.html – Joe Shaw Feb 13 '12 at 19:58
  • 1
    Heapy is by far the easiest heap profiler to run when attaching to a leaking python process with [rfoo](http://code.google.com/p/rfoo/) and works fine in a multithreaded app, and works nicely with pip using "pip install guppy" Usually the default view works, but hpy offers several views of the profile data including showing you use count by reference. The blog post linked by @JoeShaw is very helpful. – Patrick Horn Apr 04 '12 at 07:46
  • 5
    Note, heapy doesn't include memory allocated in python extensions. If anybody has worked out a mechanism to get heapy to include `boost::python` objects, it would be nice to see some examples! – amos Jul 03 '14 at 18:08
  • 40
    As of 2014-07-06, guppy does not support Python 3. – Quentin Pradet Jul 16 '14 at 19:05
  • 1
    @JamesSnyder Looks like the normal pip version (1.10) is now ok with python 2.7 – drevicko Jun 11 '15 at 03:03
  • 1
    Just installed fine with pip (python 2.7). I found that the problem I wanted to use it for (memory use continually increasing) disappears when I call h.heap(). Any ideas why this might be? – ratatoskr Jul 22 '15 at 13:43
  • How is knowing that "str" is consuming the most memory in any way useful? That could be one of a million points in the code. Without knowing where those calls are made, the info provided here is useless. – Cerin Oct 15 '18 at 15:41
  • 15
    There is a fork of guppy that supports Python 3 called guppy3. – David Foster Aug 22 '19 at 23:00
  • Where to insert this pofiler code in our existing python code? Should it be at the end/beginning? How do we integrate this profiler code in our existing code for resource usage stats? – Eswar Dec 03 '20 at 06:53
  • 1
    My favorite docs for Heapy/guppy3 is the research paper that created it, especially §6.2 "Debugging approach": http://liu.diva-portal.org/smash/get/diva2:22287/FULLTEXT01 – David Foster Apr 09 '21 at 12:56
84

I recommend Dowser. It is very easy to setup, and you need zero changes to your code. You can view counts of objects of each type through time, view list of live objects, view references to live objects, all from the simple web interface.

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.server.quickstart()
    cherrypy.engine.start(blocking=False)

You import memdebug, and call memdebug.start. That's all.

I haven't tried PySizer or Heapy. I would appreciate others' reviews.

UPDATE

The above code is for CherryPy 2.X, CherryPy 3.X the server.quickstart method has been removed and engine.start does not take the blocking flag. So if you are using CherryPy 3.X

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.engine.start()
Matti Lyra
  • 12,828
  • 8
  • 49
  • 67
sanxiyn
  • 3,648
  • 1
  • 19
  • 15
  • 4
    but is it only for cherrypy, how to use it with a sinple script? – Anurag Uniyal Sep 21 '08 at 05:05
  • 14
    It is not for CherryPy. Think of CherryPy as a GUI toolkit. – sanxiyn Sep 21 '08 at 07:07
  • 1
    fwiw, the pysizer page http://pysizer.8325.org/ seems to recommend heapy, which it says is similar – Jacob Gabrielson Jul 07 '09 at 22:48
  • 1
    It looks as though your above code is for use with CherryPy 2.x. For CherryPy 3.x, remove the `blocking=False` from the `cherrypy.engine.start()` call. – Craig McQueen Sep 08 '10 at 03:47
  • 8
    There is a generic WSGI port of Dowser called Dozer, which you can use with other web servers as well: pypi.python.org/pypi/Dozer – Joe Shaw Feb 13 '12 at 19:58
  • 3
    cherrypy 3.1 removed cherrypy.server.quickstart(), so just use cherrypy.engine.start() – MatsLindh Jan 24 '13 at 13:39
  • I like and use dowser, but the problem for me is that the application I'm using it in gives you like 1000 graphs and it becomes a pain to find what is important, and after you do, the pain point may have so many graphs that the trace page doesn't even load properly. So it doesn't scale very well. – rschwieb Apr 23 '18 at 19:52
  • 1
    It looks like aminus.net no longer exists. Some quick web searching found references to it that only indicated it existing on aminus.net websites. Telling Anaconda Prompt `conda search dowser` found nothing. I would conclude that Dowser is no longer easily available, and is surely not being maintained. – Post169 Jun 11 '18 at 19:18
  • 4
    this doesn't work in python 3. I get an obvious StringIO error. – dtc Mar 17 '20 at 15:09
  • 2
    Be careful: the link in the answer to Dowser is taking me to very phony websites impersonating more or less respectable news sources... – Damian Birchler Jun 03 '21 at 09:45
70

Consider the objgraph library (see this blog post for an example use case).

interDist
  • 487
  • 4
  • 13
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 9
    objgraph helped me solve a memory leak issue I was facing today. objgraph.show_growth() was particularly useful – Ngure Nyaga Oct 11 '12 at 19:36
  • 1
    I, too, found objgraph really useful. You can do things like `objgraph.by_type('dict')` to understand where all of those unexpected `dict` objects are coming from. – dino Aug 12 '13 at 13:20
19

Muppy is (yet another) Memory Usage Profiler for Python. The focus of this toolset is laid on the identification of memory leaks.

Muppy tries to help developers to identity memory leaks of Python applications. It enables the tracking of memory usage during runtime and the identification of objects which are leaking. Additionally, tools are provided which allow to locate the source of not released objects.

Serrano
  • 1,457
  • 20
  • 17
16

I'm developing a memory profiler for Python called memprof:

http://jmdana.github.io/memprof/

It allows you to log and plot the memory usage of your variables during the execution of the decorated methods. You just have to import the library using:

from memprof import memprof

And decorate your method using:

@memprof

This is an example on how the plots look like:

enter image description here

The project is hosted in GitHub:

https://github.com/jmdana/memprof

jmdana
  • 429
  • 5
  • 5
  • 3
    How do I use it? What is a,b,c? – tommy.carstensen Sep 08 '13 at 12:51
  • 1
    @tommy.carstensen `a`, `b` and `c` are the names of the variables. You can find the documentation at http://github.com/jmdana/memprof. If you have any questions please feel free to submit an issue in github or send an email to the mailing list that can be found in the documentation. – jmdana Sep 09 '13 at 12:13
12

I found meliae to be much more functional than Heapy or PySizer. If you happen to be running a wsgi webapp, then Dozer is a nice middleware wrapper of Dowser

7

Try also the pytracemalloc project which provides the memory usage per Python line number.

EDIT (2014/04): It now has a Qt GUI to analyze snapshots.

vstinner
  • 1,274
  • 14
  • 7
  • 9
    `tracemalloc` is now part of the python standard library. See https://docs.python.org/3/library/tracemalloc.html – Dan Milon Feb 16 '15 at 18:48