23

Do you know about an efficient way to log memory usage of a django app per request ?

I have an apache/mod_wsgi/django stack, which runs usually well, but sometimes one process ends up eating a huge lot of memory. The servers ends up being short on mem, swapping a lot, and services are dramatically slowed down.

This situation is quite hard to fix because I don't know which request is to be blamed for this behavior, I can't reproduce it.

I'd like to have something deployed in production which logs the memory usage of the process before and after each request, with minimal overhead.


Before I start reinventing the wheel, do the community of my fellow djangoists know any existing solution to address this problem ? Advices, middleware, snippet or maybe apache log configuration appreciated.

What (I think) I don't need is:

  • a set of dev-stage profiling/debugging tools, I already know some and I'd use them if I knew what to profile/debug, it looks a little bit too much to be forever monitoring services running in production. On top of that, what is usually displayed by those tol is a mem usage report of the code shred to pieces, It would really be helpful to just pinpoint the faulty request.
  • generic advices on how to optimize mem usage of a django app, well it's always good to read, but the idea here is rather «how to efficiently track down requests which need to be optimized».

My closest search results:

Community
  • 1
  • 1
ddelemeny
  • 1,891
  • 13
  • 18
  • Maybe this modwsgi option 'maximum-requests=nnn' will help. "Defines a limit on the number of requests a daemon process should process before it is shutdown and restarted. " – freestyler Sep 03 '12 at 13:54
  • @freestyler: yep i already use this, but it kinda misses the point. The idea is to put in light faulty requests to actually fix them, not to sanitize the system periodically (which might be useful too). Besides, a memory consuming request could appear early after a restart, no correlation here. – ddelemeny Sep 04 '12 at 10:41

2 Answers2

24

A Django middleware for tracking memory usage and generating a usable result immediately, needs to hook both process request and process response. In other words, look at difference between start and finish of request and log a warning if exceeds some threshold.

A complete middleware example is:

import os
import psutil
import sys

THRESHOLD = 2*1024*1024

class MemoryUsageMiddleware(object):

    def process_request(self, request):
        request._mem = psutil.Process(os.getpid()).memory_info()

   def process_response(self, request, response):
        mem = psutil.Process(os.getpid()).memory_info()
        diff = mem.rss - request._mem.rss
        if diff > THRESHOLD:
            print >> sys.stderr, 'MEMORY USAGE %r' % ((diff, request.path),)
        return response

This requires the 'psutil' module to be installed for doing memory calculation.

Is brute force and can lead to false positives in a multithread system. Because of lazy loading, you will also see it trigger on first few requests against new process as stuff loads up.

Cerin
  • 60,957
  • 96
  • 316
  • 522
Graham Dumpleton
  • 57,726
  • 6
  • 119
  • 134
  • That's neat, thank you for the answer ! False positives are not really a problem as the purpose is only to trigger and focus further investigation, coupling this info with PID and date/time may help understand and eliminate those cases quickly. As a side note, it may be worth putting this into a WSGI middleware, so that it's not tied to the django machinery. – ddelemeny Sep 04 '12 at 10:20
  • 1
    To do it properly as a WSGI middleware gets horribly more complicated. When I did this it was for a Django users, so easier to do it as a Django middleware. :-) – Graham Dumpleton Sep 05 '12 at 00:58
  • Oh, yeah, sure ! It was just food for thoughts, I didn't expect you to whip out a WSGI middleware. In fact, I didn't expect somebody to create a django middleware out of the blue to answer my question at all, thanks for that ! SO community looks quite awesome ;-) – ddelemeny Sep 05 '12 at 06:00
  • 1
    I am getting this error : 'WSGIRequest' object has no attribute '_mem' at this line: diff = mem.rss - request._mem.rss – vijay shanker Jun 22 '14 at 18:28
  • 2
    @vijayshanker, I know it has been long time and you probably have found the issue, but, just for the record, the reason you were seeing that error is that some request doesn't trigger both process_request and process_response, such as 301 (redirect) which only trigger process_response. This also means this solution has a flaw. You should at least check whether request has '_mem' before accessing it. – Jerry Meng Mar 10 '15 at 15:10
  • 3
    An incredibly useful answer, but use caution since process_request is NOT always called for the corresponding process_response call (see https://docs.djangoproject.com/en/1.8/topics/http/middleware/#process-response). I simply wrapped the process_response logic in a "if hasattr(request, "_mem"):" and that fixed it for me. – foslock Jul 27 '15 at 19:27
  • Based on this answer I wrote a django middleware that logs the info to Datadog: https://github.com/krmboya/dj-datadog Incase anyone might be interested. – krm Nov 10 '16 at 07:02
1

This may not fully cover your question, but I recommend trying nginx+uwsgi instead of apache2+mod_wsgi. In my tests it turned out to be much more stable (mod_wsgi choked at some point completely), much faster and uses a lot less memory (it may just fix all your issues altogether).

About tracking memory usage, you can create a simple middleware:

class SaveMemoryUsageMiddleware(object):
    def process_response(self, request, response):
        # track memory usage here and append to file or db
        return response

and add it to your middlewares.

For memory tracking code I recommend checking out: Total memory used by Python process?

However, it would probably be better if you could avoid doing this on production. Just for dev and tests to track down real problem.

Community
  • 1
  • 1
arkens
  • 99
  • 2
  • 8
  • Thank you for the answer, replacing apache is somehow planned, but low priority. Nginx is already at the top of my stack. The middleware way is the sort of thing i was planning to do if nobody could came up with an existing solution. Thanks for the pointers on memory consumption monitoring inside python, that's likely to help. – ddelemeny Sep 03 '12 at 17:53
  • 1
    If your Apache used a lot more memory, then you set Apache up wrong, simple as that. Watch my PyCon talk as it covers the common mistake people do of using the Apache defaults. http://lanyrd.com/2012/pycon/spcdg/ – Graham Dumpleton Sep 03 '12 at 21:46
  • There is a big difference on how nginx and apache2 works internally. You cannot expect that process/thread based pool (apache) will by any chance use similar amounts of memory as event based pool (nginx). Simple as that. – arkens Oct 18 '12 at 17:14
  • I've used both Apache and Nginx for hosting Django, and there's not much of a difference performance-wise. Most benchmark tests I've seen also confirm this. – Cerin Mar 12 '19 at 17:46