22

I am using Django REST framework for my API and yesterday I wanted to see how it works for large data. I found this tutorial about how to profile your requests (written by Tom Christie) and I discovered that for 10.000 users, my request was taking an astonishing 2:20 minutes.

Most of the time was being spent on serializing the objects (around 65%) so I was wondering what can I do to speed things up ?

My user model is actually extending the default django model, so using .values() does not work, because I am not also getting the nested model (even though it is a LOT faster).

Any help would be greatly appreciated :)

Edit

I am already using .select_related() when retrieving my queryset, and it has improved my time, but only by a few seconds. The number of total queries is 10, so my problem is not with the database access.

Also, I am using .defer(), in order to avoid fields that I don't need in this request. That also provided a small improvement, but not enough.

Edit #2

Models

from django.contrib.auth.models import User
from django.db.models import OneToOneField
from django.db.models import ForeignKey

from userena.models import UserenaLanguageBaseProfile
from django_extensions.db.fields import CreationDateTimeField
from django_extensions.db.fields import ModificationDateTimeField

from mycompany.models import MyCompany


class UserProfile(UserenaLanguageBaseProfile):
    user = OneToOneField(User, related_name='user_profile')
    company = ForeignKey(MyCompany)
    created = CreationDateTimeField(_('created'))
    modified = ModificationDateTimeField(_('modified'))

Serializers

from django.contrib.auth.models import User

from rest_framework import serializers

from accounts.models import UserProfile


class UserSerializer(serializers.ModelSerializer):
    last_login = serializers.ReadOnlyField()
    date_joined = serializers.ReadOnlyField()
    is_active = serializers.ReadOnlyField()

    class Meta:
        model = User
        fields = (
            'id',
            'last_login',
            'username',
            'first_name',
            'last_name',
            'email',
            'is_active',
            'date_joined',
        )


class UserProfileSerializer(serializers.ModelSerializer):
    user = UserSerializer()

    class Meta:
        model = UserProfile
        fields = (
            'id',
            'user',
            'mugshot',
            'language',
        )

Views

class UserProfileList(generics.GenericAPIView,
                      mixins.ListModelMixin,
                      mixins.CreateModelMixin):

    serializer_class = UserProfileSerializer
    permission_classes = (UserPermissions, )

    def get_queryset(self):
        company = self.request.user.user_profile.company
        return UserProfile.objects.select_related().filter(company=company)

    @etag(etag_func=UserListKeyConstructor())
    def get(self, request, *args, **kwargs):
        return self.list(request, *args, **kwargs)
Akshat Zala
  • 710
  • 1
  • 8
  • 23
AdelaN
  • 3,366
  • 2
  • 25
  • 45
  • We're going to need to see your models and serializers in order to see what might be slow. – Kevin Brown-Silva Mar 13 '15 at 11:11
  • Ok, so I restarted my server (because it was throwing some exceptions) and also removed some unnecessary fields, and now it runs a lot better (about 32 seconds). Does that seem like an acceptable time to you, Kevin ? – AdelaN Mar 13 '15 at 14:34
  • I have no idea what you are doing in your serializers, so that could either be really good (if you are using a `SerializerMethodField` or property that processes data) or really bad (if you're just pulling things from the database). – Kevin Brown-Silva Mar 14 '15 at 16:36
  • I added the relevant code snippets. Not doing any data processing, just computing some ETags. I will rerun the tests today to see if I have the same time. – AdelaN Mar 16 '15 at 12:21
  • Move to Flask, you'll be glad you did. – JTW Aug 23 '23 at 21:12

3 Answers3

15

Almost always the performance issues come from N+1 queries. This is usually because you are referencing related models, and a single query per relationship per object is generated to get the information. You can improve this by using .select_related and .prefetch_related in your get_queryset method, as described in my other Stack Overflow answer.

The same tips that Django provides on database optimization also applies to Django REST framework, so I would recommend looking into those as well.

The reason why you are seeing the performance issues during serialization is because that is when Django makes the queries to the database.

phoenix
  • 7,988
  • 6
  • 39
  • 45
Kevin Brown-Silva
  • 40,873
  • 40
  • 203
  • 237
  • 3
    I am already using .select_related(), so that's not it. I did look into your answer before posting the question :) It helped, but it is still very slow. Thanks. – AdelaN Mar 13 '15 at 08:43
  • 1
    There appear to be cases where even using select_related() with no parameters (not a great idea for production, but good for debugging this issue), still results in N+1 queries. – GDorn Jun 10 '15 at 19:12
  • 1
    Keep in mind that select_related() is for ForeignKey fields. If you have ManyToMany fields and wish to reduce the number of queries, you need to use prefetch_related with those. – GDorn Jun 10 '15 at 21:06
  • 1
    Note that `prefetch_related` has to be sorted appropriately to work. Django Debug Toolbar will show repeated queries in case it doesn't. But still, it may not be the case of N+1. I'm here because SQL takes 1.5s, but there's still 5s additional CPU time. – dhill Feb 11 '19 at 14:15
12

ModelSerializers are slow, you said it yourself. Here's some more information on why it happens and how to speed things up: https://hakibenita.com/django-rest-framework-slow

  • In performance critical endpoints, use a "regular" serializer, or none at all.
  • Serializer fields that are not used for writing or validation, should be read only.
barrtin
  • 181
  • 1
  • 6
8

I know this is old and you probably solved your problem already ... but for anyone else making it to this article...

The problem is you're doing a blind

select_related()

with no parameters, which does absolutely nothing for your query. What you really need to do is

prefetch_related('user_profile')

Without getting into the details, select_related is for "to one" relationships, and prefetch_related is for "to many" relationships. In your case, you're using a reverse relationship which is a "to many" query.

Your other problem is that you weren't using the reverse relationship correctly. change your get_queryset() in your serializer to this and I think you'll have what you want:

def get_queryset(self):
    return UserProfile.objects.prefetch_related('user_profile').all()
jaredn3
  • 101
  • 1
  • 3