1

I have a Django REST Framework -based project where one of the endpoints is super slow. Turns out it's an N+1 queries problem, where each of the returned hundreds of rows causes an additional query. The endpoint is not supposed to do any joins or additional queries, just return the contents of a single table, filtered a bit. The model for that table contains a few foreign keys, but none of the referents are supposed to be accessed, only the ID's are supposed to be rendered as JSON and sent to the client.

A bit of trial and error reveals that it's DRF's Serializer that causes the additional queries:

class FooSerializer(serializers.ModelSerializer):
    class Meta:
        model = Foo
        fields = ["data_field_a", "data_field_b", "bar_code", "qux_id"]

When I comment bar_code out in the serializer, the query becomes as fast as expected. What's surprising is thatqux_id, even though the field is basically identical to bar_code, doesn't cause such a slowdown. I read https://www.django-rest-framework.org/api-guide/relations/ and among other stuff, tried setting depth = 0 in the Meta class.

Also, I understand that using select_related or prefetch_related would essentially serve as a quick fix for the problem, but strictly saying, those shouldn't be needed for this case, as no joins or additional queries are actually warranted, and I'm trying to understand the root cause of the problem to deepen my understanding.

So, I have a good grasp of actual cause of the slowdown (additional queries by the serializer), but less so about why the serializer decides to do those queries for one field but not for the other. My question is, how should I go on debugging this? I have no prior experience in DRF.

Edit: I was asked to share the code of the model, so here we go. I removed all the other fields, as they didn't seem to have anything to do with the bug; I could still reproduce it:

class Foo(models.Model):
    bar_code = models.ForeignKey(
        Bar, to_field="bar_code", db_column="bar_code", on_delete=models.CASCADE
    )
    qux_id = models.ForeignKey(
        Qux, to_field="qux_id", db_column="qux_id", on_delete=models.CASCADE
    )

(Edit: as later turns out, the difference in these two is whether the field is the primary key of the referred model or not.)

Edit2: Even more minimal:


class FooSerializer(serializers.ModelSerializer):
    class Meta:
        model = Foo
        fields = ["bar_code"]
class Foo(models.Model):
    bar_code = models.ForeignKey(
        Bar, to_field="bar_code", db_column="bar_code", on_delete=models.CASCADE
    )
GolDDranks
  • 3,272
  • 4
  • 22
  • 30
  • It also came to my mind that it's kind of naive to expect DRF to have no bugs, so it might be just a bug? I'm not well versed in the framework, though, so I decided to leave the benefit of doubt that I'm just missing something or using it the wrong way. – GolDDranks Jun 03 '22 at 20:05
  • Can you share the code for model `Foo`? – Scratch'N'Purr Jun 03 '22 at 20:11
  • I'll share it. Pardon me, it takes a bit to clean it up. (I want to share code that I have tested to actually work, or in this case, not to work.) – GolDDranks Jun 03 '22 at 21:46
  • I'll try to have a minimized repo uploaded tomorrow. – GolDDranks Jun 03 '22 at 22:36
  • Hey! I don't know if it might be of use in your case but maybe `Django Debug Toolbar` helps you identify/get to to the root of the issue. It has an [SQL panel](https://django-debug-toolbar.readthedocs.io/en/latest/panels.html#sql) to show you the queries made and it works also with DRF (https://stackoverflow.com/questions/49862560/does-django-debug-toolbar-work-with-drf) which may come handy to check why Django is making those extra queries – mtzd Jun 03 '22 at 23:49
  • Thanks, I'm actually using DjDT, and noticed the problem from the SQL panel. However, I know that the queries happen, but what I do not know is that _why_ they happen. – GolDDranks Jun 04 '22 at 13:46
  • I'm also unable to reproduce the problem from a clean install, so I'm now going to clone the bugging project and try to simplify it down. – GolDDranks Jun 04 '22 at 13:47
  • Managed to reproduce it, here: https://github.com/golddranks/drf-demo The endpoint /make_foo creates the data, and the endpoint /foo demonstrates the performance problem: as DjDT shows there are 301 queries even thought there are supposed to be only 1. – GolDDranks Jun 04 '22 at 14:58

1 Answers1

1

So, I discovered my answer, and here's how I debugged it. Mainly three things:

  1. Inspecting what the Serializer does. https://www.django-rest-framework.org/api-guide/relations/#inspecting-relationships
poetry run python manage.py shell
...
>>> from app.views import FooSerializer
>>> print(repr(FooSerializer()))
FooSerializer():
    bar_code = SlugRelatedField(queryset=Bar.objects.all(), slug_field='unique_code')

This clearly shows, that it is considering the field as a reference, not just a plain value. This might be obvious to some Django developers, but it certainly was not for me, as the value we are trying to serialize is the value of the field on Foo, not a field of Bar. It seems that it goes by logic like "Fetch Bar using a foreign key to Bar's unique (non-PK) field. After having a Bar, read that field." But that seems counterintuitive and unneccessary to me, as we already had the value to access Bar to begin with!

  1. Doing a minimal reproduction of the problem: https://github.com/golddranks/drf-demo

This helped me to notice what was relevant about the problem and what was not. Playing with the reproduction also made me notice that the problem was not just about the Serializer, but just generally accessing the bar_code field. I realized that I didn't understand how to access the underlying value of that field, without triggering fetching the whole referenced model. By playing around, I also discovered that when bar_code is the primary key of Bar, it is actually smart enough to not load Bar, but just access the value of Foo's foreign key value directly!

  1. Asking around with the newfounded knowledge.

A friend send me a pull request that shows how to access the underlying value of the foreign key field: accessing the field as bar_code_id, with a suffix _id attached. What was surprising to me, was that the DB table doesn't have such a column! From what I gathered from the documentation (https://docs.djangoproject.com/en/dev/ref/models/fields/#database-representation), I had thought that accessing the underlying field in such way was possible in the case Django automatically created an underlying ID field, but in this case it didn't, we manually created it, and it certainly doesn't have the suffix _id! However bar_code_id works and manages to fetch the underlying value without triggering loading Bar!

GolDDranks
  • 3,272
  • 4
  • 22
  • 30
  • This is sneakily documented here: https://docs.djangoproject.com/en/4.0/topics/db/optimization/#use-foreign-key-values-directly – AKX Jun 05 '22 at 15:31