4

If A contains a set of B and B contains a set of C, then I am looking for a way to start with A and end up with a query set of C.

A simple example:

class Book(models.Model):
    name = models.CharField(max_length=64)


class Page(models.Model):
    number = models.IntegerField()
    book = models.ForeignKey(Book)


class Paragraph(models.Model):
    number = models.IntegerField()
    page = models.ForeignKey(Page)


def query():
    books = Book.objects.all()\
       .prefetch_related('page_set', 'page_set__paragraph_set')

    for book in books:
        pages = book.page_set

        # I need to do something like this
        paragraphs = pages.all().paragraph_set
        # invalid

        # or
        paragraphs = book.page_set.select_related('paragraph_set')
        # valid, but paragraphs is still a QuerySet of Pages

        # this works, but results in one query for EVERY book,
        # which is what I need to avoid
        paragraphs = Paragraph.objects.filter(page__book=book)


        # do stuff with the book
        #...


        # do stuff with the paragraphs in the book
        # ...

How do I get a query set of paragraphs from just a Book instance?

The named args syntax for Django queries supports an infinite level of nesting of sets/foreign key relationships, but I can't find a way to use the ORM mapping to actually get a related query set from the top-down.

And getting query-set from the bottom-up negates the benefits of prefetch_related/select_related.

The above example is a simplified version of what I need to do in my application. The database has thousands of "Books" and any n + 1 queries must be avoided.

I found a question about using prefetch across multiple levels, but the answer didn't address how to actually obtain the fetched query set for use.

Brian H.
  • 2,092
  • 19
  • 39
  • Just to be clear, you don't want the bottom-up approach like this: `Paragraph.objects.filter(page__book=the_book_instance)` ? – Ralf Oct 30 '18 at 23:13

2 Answers2

1

After you've done the prefetch, it seems like the only cheap way to access the child records is through all(). Any filters seem to trigger another database query.

The short answer to your question about all the paragraphs in a book is to use a list comprehension with two levels:

    paragraphs = [paragraph
                  for page in book.page_set.all()
                  for paragraph in page.paragraph_set.all()]

Here's a runnable example:

# Tested with Django 1.11.13
from __future__ import print_function
import os
import sys

import django
from django.apps import apps
from django.apps.config import AppConfig
from django.conf import settings
from django.core.files.base import ContentFile, File
from django.db import connections, models, DEFAULT_DB_ALIAS
from django.db.models.base import ModelBase

from django_mock_queries.mocks import MockSet, mocked_relations

NAME = 'udjango'


def main():
    setup()

    class Book(models.Model):
        name = models.CharField(max_length=64)

    class Page(models.Model):
        number = models.IntegerField()
        book = models.ForeignKey(Book)

    class Paragraph(models.Model):
        number = models.IntegerField()
        page = models.ForeignKey(Page)

    syncdb(Book)
    syncdb(Page)
    syncdb(Paragraph)

    b = Book.objects.create(name='Gone With The Wind')
    p = b.page_set.create(number=1)
    p.paragraph_set.create(number=1)
    b = Book.objects.create(name='The Three Body Problem')
    p = b.page_set.create(number=1)
    p.paragraph_set.create(number=1)
    p.paragraph_set.create(number=2)
    p = b.page_set.create(number=2)
    p.paragraph_set.create(number=1)
    p.paragraph_set.create(number=2)

    books = Book.objects.all().prefetch_related('page_set',
                                                'page_set__paragraph_set')

    for book in books:
        print(book.name)
        paragraphs = [paragraph
                      for page in book.page_set.all()
                      for paragraph in page.paragraph_set.all()]
        for paragraph in paragraphs:
            print(paragraph.page.number, paragraph.number)


def setup():
    DB_FILE = NAME + '.db'
    with open(DB_FILE, 'w'):
        pass  # wipe the database
    settings.configure(
        DEBUG=True,
        DATABASES={
            DEFAULT_DB_ALIAS: {
                'ENGINE': 'django.db.backends.sqlite3',
                'NAME': DB_FILE}},
        LOGGING={'version': 1,
                 'disable_existing_loggers': False,
                 'formatters': {
                    'debug': {
                        'format': '%(asctime)s[%(levelname)s]'
                                  '%(name)s.%(funcName)s(): %(message)s',
                        'datefmt': '%Y-%m-%d %H:%M:%S'}},
                 'handlers': {
                    'console': {
                        'level': 'DEBUG',
                        'class': 'logging.StreamHandler',
                        'formatter': 'debug'}},
                 'root': {
                    'handlers': ['console'],
                    'level': 'WARN'},
                 'loggers': {
                    "django.db": {"level": "DEBUG"}}})
    app_config = AppConfig(NAME, sys.modules['__main__'])
    apps.populate([app_config])
    django.setup()
    original_new_func = ModelBase.__new__

    @staticmethod
    def patched_new(cls, name, bases, attrs):
        if 'Meta' not in attrs:
            class Meta:
                app_label = NAME
            attrs['Meta'] = Meta
        return original_new_func(cls, name, bases, attrs)
    ModelBase.__new__ = patched_new


def syncdb(model):
    """ Standard syncdb expects models to be in reliable locations.

    Based on https://github.com/django/django/blob/1.9.3
    /django/core/management/commands/migrate.py#L285
    """
    connection = connections[DEFAULT_DB_ALIAS]
    with connection.schema_editor() as editor:
        editor.create_model(model)

main()

And here's the end of the output. You can see that it only runs one query for each table.

2018-10-30 15:58:25[DEBUG]django.db.backends.execute(): (0.000) SELECT "udjango_book"."id", "udjango_book"."name" FROM "udjango_book"; args=()
2018-10-30 15:58:25[DEBUG]django.db.backends.execute(): (0.000) SELECT "udjango_page"."id", "udjango_page"."number", "udjango_page"."book_id" FROM "udjango_page" WHERE "udjango_page"."book_id" IN (1, 2); args=(1, 2)
2018-10-30 15:58:25[DEBUG]django.db.backends.execute(): (0.000) SELECT "udjango_paragraph"."id", "udjango_paragraph"."number", "udjango_paragraph"."page_id" FROM "udjango_paragraph" WHERE "udjango_paragraph"."page_id" IN (1, 2, 3); args=(1, 2, 3)
Gone With The Wind
1 1
The Three Body Problem
1 1
1 2
2 1
2 2
Don Kirkby
  • 53,582
  • 27
  • 205
  • 286
  • thanks. I had cobbled together something similar, but the double `for` list comprehension is much cleaner than what I had. The only issue with this and @SamMason answer is that the result is a list and not a `QuerySet`. In my specific use, I'm passing this result to another method that expects a QuerySet parameter with methods like `.count()`, and `.filter()`. But after doing some research it looks like there is no way to append to a QuerySet, and combining QuerySets via `|` or `union()` calls the internal `_clone()` method, which does not copy the cache. So future operations re-run the query. – Brian H. Oct 31 '18 at 16:21
  • You can use duck typing, @BrianHVB. Write a class that holds the list of paragraph objects, and has `.count()` and `.filter()` methods that operate on that list. I've used the [django-mock-queries library](https://github.com/stphivos/django-mock-queries) for running Django tests without a database, but you could use a simpler version to stick to your cached data. However, it's probably simpler to modify the method that is expecting a QuerySet so that it will accept a list. – Don Kirkby Oct 31 '18 at 18:31
1

further to Don's answer, you can use Prefetch objects to apply any filters you want, e.g:

from django.db import models, connection

def query():
    paragraph_filter = models.Prefetch(
        'page_set__paragraph_set',
        Paragraph.objects.filter(number__gt=1))

    books = Book.objects.all().prefetch_related(
        'page_set', paragraph_filter)

    for book in books:
        for page in book.page_set.all():
            for paragraph in page.paragraph_set.all():
                print(paragraph)

    print(connection.queries)

Django takes care of ensuring that all appropriate objects gets loaded in a small number of queries (one per table, so you'll get three queries)

Sam Mason
  • 15,216
  • 1
  • 41
  • 60