Memory leak with Django + Django Rest Framework + mod_wsgi

Question

I have the following code where I have a function based view which uses a ModelSerializer to serialize data. I am running this with apache + mod_wsgi (with 1 worker thread, 1 child threads and 1 mod_wsgi threads for the sake of simplicity).

With this, my memory usage shoots up significantly (200M - 1G based on how large the query is) and stays there and does not come down even on request completion. On subsequent requests to the same view/url, the memory increases slightly everytime but does not take a significant jump. To rule out issues with django-filter, I have modified my view and have written filtering query myself.

The usual suspect that DEBUG=True is ruled out as I am not running in DEBUG mode. I have even tried to use guppy to see what is happening but I was unable to get far with guppy. Could someone please help why the memory usage is not down after the request is completed and how to go about debugging it?

Update: I am using default CACHE setting i.e. I have not defined it at all, in which case I presume it is going to use Local Memory for cache as mentioned in the docs.

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
    }
}



class MeterData(models.Model):
    meter = models.ForeignKey(Meter)
    datetime = models.DateTimeField()

    # Active Power Total
    w_total = models.DecimalField(max_digits=13, decimal_places=2,
                                  null=True)
    ...


class MeterDataSerializer(serializers.ModelSerializer):
    class Meta:
        model = MeterData
        exclude = ('meter', )


@api_view(['GET', ])
@permission_classes((AllowAny,))
def test(request):
    startDate = request.GET.get('startDate', None)
    endDate = request.GET.get('endDate', None)
    meter_pk = request.GET.get('meter', None)
    # Writing query ourself instead of using django-filter to
    # to keep things simple.
    queryset = MeterData.objects.filter(meter__pk=meter_pk,
                                        datetime__gte=startDate,
                                        datetime__lte=endDate)


    logger.info(queryset.query)
    kwargs = {}
    kwargs['context'] = {
        'request': request,
        'view': test,
        'format': 'format',
    }
    kwargs['many'] = True

    serializer = MeterDataSerializer(queryset, **kwargs)
    return Response(serializer.data)

@Sayse: I have left the CACHE setting to default i.e not defined it, in which case, it will use LocalMemory for cache I presume. — Divick, Dec 01 '16 at 11:56
I don't believe it is a leak, I think what you're seeing is data cached but I don't think I have enough to say for certain.. ([local memory caching](https://docs.djangoproject.com/en/1.10/topics/cache/#local-memory-caching)) — Sayse, Dec 01 '16 at 12:00
@Sayse: Yeah to me also it appears that it is more of a caching than a memory leak as is evident from the memory usage not increasing on second query but I am not sure why is it caching in the first place and what is it caching? — Divick, Dec 01 '16 at 12:04
@Sayse: I explicitly tried to set the timeout of LocMem cache to a very small value (1 sec) and it still doesn't evict the cache. So it seems that timeout doesn't work for LocMem cache. — Divick, Dec 01 '16 at 14:40
I dont know too much about it im afraid, just thought it might be the issue — Sayse, Dec 01 '16 at 20:42
Ignoring the cache, the way the UNIX memory model works is that when a process allocates memory, even if it is freed, it is only freed back to the in process memory allocator in most cases, it doesn't get freed back to the operating system. Thus the process memory usage will not reduce, but the memory will still be reused for subsequent memory allocations within the same process. — Graham Dumpleton, Dec 01 '16 at 23:46
So if you pull in huge amounts of data and work on it, you can expect memory usage of the process to be quite high. If you are processing the data, if possible don't pull it all into memory at the same time, but pull the data in batches and process it a part at a time. — Graham Dumpleton, Dec 01 '16 at 23:47
I am still struggling for same case. anyone find it yet? What I have done so far is that I pretty much have same issue that if my API view is getting called 100 times it increase my memory by 10 MB each i.e ~1GB and I need to restart my server always to free up memory. I don't even have CACHE but this still occur. I used cProfile and memory profile in python and I see that after every request def post(request) function of my API view allocates ~10 MB to some JSON dict data which I am sending back as a response (Response(data=dict_data)) and it always increases my memory never turns it dow. — DJDeveloper, Apr 23 '22 at 11:24

score 3 · Answer 1 · answered Dec 01 '16 at 12:05

Whilst I can't say for certain, I'll add this as an answer anyway to be judged on it...

As you know, django's default cache is the LocMemCache

Which in the above docs you'll find:

Note that each process will have its own private cache instance

And I think this is all you're seeing. The jump in memory is just the storage of your query. I'd think you only need to be concerned if this memory usage continued to grow beyond a normalcy.

The same doc also says it might not be very viable in production so it might be time to move beyond this, which would also allow you to see if caching was the culprit.

Memory leak with Django + Django Rest Framework + mod_wsgi

1 Answers1