1

I like Django + DRF combination and I have used them for a long time, but this problem has bugged me for a while. The problem is that queries + serialization take lots of time when there are ManyToMany or OneToMany relations and nested objects. There are plenty of similar questions in StackOverflow and usually the problem has been some form of "N + 1" problem (or not solved).

e.g.

Also, suggestion to not load that many objects at once.. probably applies here, though I need all items at once.

Handling queries is a big part of the problem in this case also, but there are not too many queries and queries themselves are fast. I'm using prefetch_related to limit number of queries and what I'm seeing from DB queries everything is looking Okay..ish (?). I get one query for each prefetch_related property + the original query for serialized objects. There are lots and lots of IDs included in prefetch_related queries, but I guess that is inevitable - as many IDs as there are original items.

When profiling as shown here https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ I get results that DB lookups take most of the time while serializing is not too fast either. As an example I have about 12k "Items" in the my local PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and majority of "Items" have also 1-2 "Photos". Fetching and serializing that data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for deployment and the timing is about the same there. On larger "Item" sets serialization time is increasing more than DB lookup time, but DB lookups always take most of the time. With 40k items you'll start to reach 1 min execution time and different timeouts from Nginx and other parts of the stack.

Example with 12k items (models, serializers and queries below)

Database lookup               | 14.0292s
Serialization                 | 6.0076s
Django request/response       | 0.3488s
API view                      | 0.2170s
Response rendering            | 1.1092s

If I leave Photos and Events out the result is

Database lookup               | 1.2447s
Serialization                 | 3.9668s
Django request/response       | 0.2435s
API view                      | 0.1320s
Response rendering            | 0.8495s

So, the related fields are taking most of the time (many=True). The profiling I used for testing is making list out of queryset before serializing. Therefore lazy queries are executed before serialization. If I don't do that, it doesn't change the overall results, but DB lookups are evaluated when serializing with about the same amount of time.

Now the problem for me is that all queries that are done are fast if executed manually (time that django.db.backends logs report is about the same). They are listed below. So, I believe SQL queries are fast, but DB lookups from Django's point of view are very slow. What am I missing here? Or how should I continue investigations? It feels like now it requires serious effort from Django to convert SQL query results to Django model instances. That would imply that there's something wrong with my models, right?

General questions to help debugging cases like this:

  1. What is reasonable time one would expect for fetching and serializing 10-50k objects that have related items as lists (many=True)? Even if DB lookups would be better, 6 seconds of serialization sounds wrong, doesn't it?
  2. What happens between SQL query is finished and DRF serialization starts? Is there something one can do to make improvements there?
  3. It would probably be good idea to not get all the stuff from DB, but use values() instead. If done so is there a way to still utilize DRF serializers? I like them a lot.

At the end, I could turn to caching, but I would assume that handling < 100k objects should not be an issue if done correctly.


Setup: Python 2.7.13, Django 1.10.7, DRF 3.6.3

Simplified versions of models, views and serializers:

class List(models.Model):
    ... CharFields, DateTimeFields, ForeignKeys etc. ...

class Item(models.Model):
    list = models.ForeignKey(List, on_delete=models.CASCADE, db_index=True, null=True, related_name='items')
    deleted_at = models.DateTimeField(db_index=True, blank=True, null=True, default=None)
    created_by = models.ForeignKey(User, blank=False)
    project = models.ForeignKey('projects.Project', on_delete=models.CASCADE)
    ... other CharFields, DateTimeFields, ForeignKeys etc. ...

class Event(models.Model):
    item = models.ForeignKey(Item, on_delete=models.CASCADE, db_index=True, null=True, related_name='events')
    created_by = models.ForeignKey(User, blank=False)
    deleted_at = models.DateTimeField(db_index=True, blank=True, null=True, default=None)
    ... other CharFields, DateTimeFields, ForeignKeys etc. ...

class Photo(models.Model):
    item = models.ForeignKey(Item, on_delete=models.CASCADE, db_index=True, null=True, related_name='photos')
    created_by = models.ForeignKey(User, blank=False)
    deleted_at = models.DateTimeField(db_index=True, blank=True, null=True, default=None)
    ... other CharFields, DateTimeFields, ForeignKeys etc. ...


class PhotoSerializer(serializers.ModelSerializer):
    ... other CharFields, DateTimeFields, ForeignKeys etc. ...

class EventSerializer(serializers.ModelSerializer):
    createdBy = PrimaryKeyRelatedStringField(source='created_by', read_only=True)
    createdByFullName = serializers.CharField(source='created_by.get_full_name', read_only=True)
    ... other CharFields, DateTimeFields, ForeignKeys etc. ...

class ItemSerializer(serializers.ModelSerializer):
    listName = serializers.CharField(source='list.name', read_only=True)
    createdBy = PrimaryKeyRelatedStringField(source='created_by', read_only=True)
    createdByFullName = serializers.CharField(source='created_by.get_full_name', read_only=True)
    photos = PhotoSerializer(many=True, read_only=True)
    events = EventSerializer(many=True, required=False, allow_null=True, queryset=Event.objects.all())
    ... other fields ...


class ItemListAPIView(ListAPIView):
    model = Item
    serializer_class = ItemSerializer

    def get_queryset(self):
        return Item.objects.all().filter(project_id=...).filter(deleted_at__isnull=True).prefetch_related(
            'created_by',           # ID of user who created item
            'photos',               # Photo properties
            'event__created_by',    # Full name of the person who created the event
            'list',                 # Name of the related list
        )

Example queries from tests with 14s DB lookup result:

django.db.backends: (0.196) SELECT "todo_item"."version", ... everything ... FROM "todo_item" WHERE ("todo_item"."project_id" = 1 AND "todo_item"."deleted_at" IS NULL) ORDER BY "todo_item"."created_at" DESC;
django.db.backends: (0.001) SELECT "auth_user"."id", ... everything ... FROM "auth_user" WHERE "auth_user"."id" IN (1, 2, ... some IDs ...);
django.db.backends: (0.148) SELECT "photos_photo"."version", ... everything ... FROM "photos_photo" WHERE ("photos_photo"."deleted_at" IS NULL AND "photos_photo"."item_id" IN (1, 2, ... lots of IDs... N)) ORDER BY "photos_photo"."created_at" DESC;
django.db.backends: (0.078) SELECT "events_event"."created_at", ... everything ... FROM "events_event" WHERE ("events_event"."deleted_at" IS NULL AND "events_event"."item_id" IN (1, 2, ... lots of IDs... N)) ORDER BY "events_event"."created_at" DESC, "events_event"."created_at" DESC; 
django.db.backends: (0.157) SELECT "todo_list"."created_at", ... FROM "todo_list" WHERE "todo_list"."id" IN (1, 2, ... lots of IDs... N)

Update 1

@pozs asked about timing measurements: Timing was tested locally with \timing and EXPLAIN ANALYZE. Not too detailed measurements or investigations there, but just that queries together are less than 1 second. About same as what django.db.backends logs suggest. Those tests with 14 second DB lookup times are also done locally with local database. Socket transfer times affect, that is true. Objects are fat, but not that fat. There's totally about 15 fields for "Items" and < 10 for "Events" and "Photos". That generates lots of data, but still generating enough data for > 10 s transfer locally doesn't sound quite right. Though, I might be wrong there. I'll do some more testing. Thanks for answer @pozs!

Update 2

To check if it's amount of transferred data that is slowing things down, I run same queries with psql from one of my EC2 instances to RDS DB with same DB data and wrote the data to file. SQL query execution times were generally smaller than what I got at my local setup. E.g. for item the SELECT query that took locally 196 ms took 65 ms on EC2 + RDS setup. Data transfer plus file write on top of that and it was totally 131 ms. File write is extra, but I wanted to ensure that I get all the data I expect.

I tested timing from EC2 instance with command from like:

time psql -f fat-query.psql --host=rds_host --port=5432 --username=user --dbname=dbname > output.txt

So, SQL queries are faster (with data transfer) in EC2 + RDS setup, but the whole data query + serialization is about as slow as in my local setup. Even though there is lot of data to be moved, I still think the "problem" happens somewhere between "Django receives response for SQL query" and "DRF starts serializing".

Miika
  • 61
  • 1
  • 4
  • *Now the problem for me is that all queries that are done are fast if executed manually* -- how did you measure this? With `explain analyze`? Note that the (only) problem with `explain analyze` is it cannot account network (or even unix socket) transfer times. If you are loading 12k "Items" + ~36k "Events" + ~18k "Photos" at once (and most of these entities are fat) you have to calculate with a pretty high transfer time too. You've already heard the only good advice I can give: *not load that many objects at once*. Maybe post your relevant logic, why do you *need all items at once*. – pozs Jun 09 '17 at 15:46
  • Yep, I tested timing locally with `\timing` and `EXPLAIN ANALYZE`. Not too detailed measurements or investigations there, but just that queries are not the problem. They are couple hundred milliseconds tops. – Miika Jun 09 '17 at 15:58
  • @pozs Thanks again for response. I did some extra test and updated original question. Transfer times are not significant with related test data. AWS doesn't seem to have difficulties transferring query responses between EC2 instance running Django and RDS database fast enough. Original tests were done with local Python virtualenv and PostgreSQL server. – Miika Jun 09 '17 at 21:24

1 Answers1

0

It seems that this won't be solved the way I hoped. Since still no idea why Python JOIN after prefetch_related takes such a long time, I replaced prefetch_related with separate QuerySets (same SQL queries though) and simple JOIN that fits this use case.

Replaced Django prefetch_related JOIN and DRF serialization in GET "Items":

  1. No prefetch_related, but separate QuerySets that result same SQL queries
  2. Using values since no need for objects this time
  3. Create hash tables with "Item" ID as key for related items that were fetched with separate queries
  4. JOIN related items to output by looping "Items" once through
  5. While looping "Items" through map also field names to match API specs

While normal prefetch_related + DRF serialization takes slightly over 1 minute to serialize e.g. 50k "Items", it takes about 10 seconds to produce same output with those 6 steps above.

If someone has enough Django experience to point out why 'default' prefetch + JOIN in Python is behaving so badly here, let me know. I know that e.g. using values makes things happen faster here, but spending 1 minute to JOIN 50k items is just wrong. Currently, I would assume that Django is cloning QuerySets while JOINing, making some funny loops or queries that are not shown in django.db logs. Somehow my data just doesn't fit the "optimal" use case that Django has in mind.

I asked the same question in Google Groups: https://groups.google.com/forum/#!topic/django-users/je2LmRRjI_I

Miika
  • 61
  • 1
  • 4