I like Django + DRF combination and I have used them for a long time, but this problem has bugged me for a while. The problem is that queries + serialization take lots of time when there are ManyToMany or OneToMany relations and nested objects. There are plenty of similar questions in StackOverflow and usually the problem has been some form of "N + 1" problem (or not solved).
e.g.
- Django REST Framework Serialize extremely slow
- Badly affecting performance in populating ManyToMany field values in rest api (using django rest framework)
- Django REST framework is slow
Also, suggestion to not load that many objects at once.. probably applies here, though I need all items at once.
Handling queries is a big part of the problem in this case also, but there are not too many queries and queries themselves are fast. I'm using prefetch_related
to limit number of queries and what I'm seeing from DB queries everything is looking Okay..ish (?). I get one query for each prefetch_related
property + the original query for serialized objects. There are lots and lots of IDs included in prefetch_related
queries, but I guess that is inevitable - as many IDs as there are original items.
When profiling as shown here https://www.dabapps.com/blog/api-performance-profiling-django-rest-framework/ I get results that DB lookups take most of the time while serializing is not too fast either. As an example I have about 12k "Items" in the my local PostgreSQL database for one "Project". All "Items" have 1-5 "Events" and majority of "Items" have also 1-2 "Photos". Fetching and serializing that data takes around 22 seconds on my laptop. I'm using AWS EC2 + RDS for deployment and the timing is about the same there. On larger "Item" sets serialization time is increasing more than DB lookup time, but DB lookups always take most of the time. With 40k items you'll start to reach 1 min execution time and different timeouts from Nginx and other parts of the stack.
Example with 12k items (models, serializers and queries below)
Database lookup | 14.0292s
Serialization | 6.0076s
Django request/response | 0.3488s
API view | 0.2170s
Response rendering | 1.1092s
If I leave Photos and Events out the result is
Database lookup | 1.2447s
Serialization | 3.9668s
Django request/response | 0.2435s
API view | 0.1320s
Response rendering | 0.8495s
So, the related fields are taking most of the time (many=True). The profiling I used for testing is making list
out of queryset before serializing. Therefore lazy queries are executed before serialization. If I don't do that, it doesn't change the overall results, but DB lookups are evaluated when serializing with about the same amount of time.
Now the problem for me is that all queries that are done are fast if executed manually (time that django.db.backends logs report is about the same). They are listed below. So, I believe SQL queries are fast, but DB lookups from Django's point of view are very slow. What am I missing here? Or how should I continue investigations? It feels like now it requires serious effort from Django to convert SQL query results to Django model instances. That would imply that there's something wrong with my models, right?
General questions to help debugging cases like this:
- What is reasonable time one would expect for fetching and serializing 10-50k objects that have related items as lists (many=True)? Even if DB lookups would be better, 6 seconds of serialization sounds wrong, doesn't it?
- What happens between SQL query is finished and DRF serialization starts? Is there something one can do to make improvements there?
- It would probably be good idea to not get all the stuff from DB, but use
values()
instead. If done so is there a way to still utilize DRF serializers? I like them a lot.
At the end, I could turn to caching, but I would assume that handling < 100k objects should not be an issue if done correctly.
Setup: Python 2.7.13, Django 1.10.7, DRF 3.6.3
Simplified versions of models, views and serializers:
class List(models.Model):
... CharFields, DateTimeFields, ForeignKeys etc. ...
class Item(models.Model):
list = models.ForeignKey(List, on_delete=models.CASCADE, db_index=True, null=True, related_name='items')
deleted_at = models.DateTimeField(db_index=True, blank=True, null=True, default=None)
created_by = models.ForeignKey(User, blank=False)
project = models.ForeignKey('projects.Project', on_delete=models.CASCADE)
... other CharFields, DateTimeFields, ForeignKeys etc. ...
class Event(models.Model):
item = models.ForeignKey(Item, on_delete=models.CASCADE, db_index=True, null=True, related_name='events')
created_by = models.ForeignKey(User, blank=False)
deleted_at = models.DateTimeField(db_index=True, blank=True, null=True, default=None)
... other CharFields, DateTimeFields, ForeignKeys etc. ...
class Photo(models.Model):
item = models.ForeignKey(Item, on_delete=models.CASCADE, db_index=True, null=True, related_name='photos')
created_by = models.ForeignKey(User, blank=False)
deleted_at = models.DateTimeField(db_index=True, blank=True, null=True, default=None)
... other CharFields, DateTimeFields, ForeignKeys etc. ...
class PhotoSerializer(serializers.ModelSerializer):
... other CharFields, DateTimeFields, ForeignKeys etc. ...
class EventSerializer(serializers.ModelSerializer):
createdBy = PrimaryKeyRelatedStringField(source='created_by', read_only=True)
createdByFullName = serializers.CharField(source='created_by.get_full_name', read_only=True)
... other CharFields, DateTimeFields, ForeignKeys etc. ...
class ItemSerializer(serializers.ModelSerializer):
listName = serializers.CharField(source='list.name', read_only=True)
createdBy = PrimaryKeyRelatedStringField(source='created_by', read_only=True)
createdByFullName = serializers.CharField(source='created_by.get_full_name', read_only=True)
photos = PhotoSerializer(many=True, read_only=True)
events = EventSerializer(many=True, required=False, allow_null=True, queryset=Event.objects.all())
... other fields ...
class ItemListAPIView(ListAPIView):
model = Item
serializer_class = ItemSerializer
def get_queryset(self):
return Item.objects.all().filter(project_id=...).filter(deleted_at__isnull=True).prefetch_related(
'created_by', # ID of user who created item
'photos', # Photo properties
'event__created_by', # Full name of the person who created the event
'list', # Name of the related list
)
Example queries from tests with 14s DB lookup result:
django.db.backends: (0.196) SELECT "todo_item"."version", ... everything ... FROM "todo_item" WHERE ("todo_item"."project_id" = 1 AND "todo_item"."deleted_at" IS NULL) ORDER BY "todo_item"."created_at" DESC;
django.db.backends: (0.001) SELECT "auth_user"."id", ... everything ... FROM "auth_user" WHERE "auth_user"."id" IN (1, 2, ... some IDs ...);
django.db.backends: (0.148) SELECT "photos_photo"."version", ... everything ... FROM "photos_photo" WHERE ("photos_photo"."deleted_at" IS NULL AND "photos_photo"."item_id" IN (1, 2, ... lots of IDs... N)) ORDER BY "photos_photo"."created_at" DESC;
django.db.backends: (0.078) SELECT "events_event"."created_at", ... everything ... FROM "events_event" WHERE ("events_event"."deleted_at" IS NULL AND "events_event"."item_id" IN (1, 2, ... lots of IDs... N)) ORDER BY "events_event"."created_at" DESC, "events_event"."created_at" DESC;
django.db.backends: (0.157) SELECT "todo_list"."created_at", ... FROM "todo_list" WHERE "todo_list"."id" IN (1, 2, ... lots of IDs... N)
Update 1
@pozs asked about timing measurements: Timing was tested locally with \timing
and EXPLAIN ANALYZE
. Not too detailed measurements or investigations there, but just that queries together are less than 1 second. About same as what django.db.backends logs suggest. Those tests with 14 second DB lookup times are also done locally with local database. Socket transfer times affect, that is true. Objects are fat, but not that fat. There's totally about 15 fields for "Items" and < 10 for "Events" and "Photos". That generates lots of data, but still generating enough data for > 10 s transfer locally doesn't sound quite right. Though, I might be wrong there. I'll do some more testing. Thanks for answer @pozs!
Update 2
To check if it's amount of transferred data that is slowing things down, I run same queries with psql from one of my EC2 instances to RDS DB with same DB data and wrote the data to file. SQL query execution times were generally smaller than what I got at my local setup. E.g. for item the SELECT query that took locally 196 ms took 65 ms on EC2 + RDS setup. Data transfer plus file write on top of that and it was totally 131 ms. File write is extra, but I wanted to ensure that I get all the data I expect.
I tested timing from EC2 instance with command from like:
time psql -f fat-query.psql --host=rds_host --port=5432 --username=user --dbname=dbname > output.txt
So, SQL queries are faster (with data transfer) in EC2 + RDS setup, but the whole data query + serialization is about as slow as in my local setup. Even though there is lot of data to be moved, I still think the "problem" happens somewhere between "Django receives response for SQL query" and "DRF starts serializing".