102

suppose I have this model:

class PhotoAlbum(models.Model):
    title = models.CharField(max_length=128)
    author = models.CharField(max_length=128)

class Photo(models.Model):
    album = models.ForeignKey('PhotoAlbum')
    format = models.IntegerField()

Now if I want to look at a subset of photos in a subset of albums efficiently. I do it something like this:

someAlbums = PhotoAlbum.objects.filter(author="Davey Jones").prefetch_related("photo_set")
for a in someAlbums:
    somePhotos = a.photo_set.all()

This does only two queries, which is what I expect (one to get the albums, and then one like `SELECT * IN photos WHERE photoalbum_id IN ().

Everything is great.

But if I do this:

someAlbums = PhotoAlbum.objects.filter(author="Davey Jones").prefetch_related("photo_set")
for a in someAlbums:
    somePhotos = a.photo_set.filter(format=1)

Then it does a ton of queries with WHERE format = 1! Am I doing something wrong or is django not smart enough to realise it has already fetched all the photos and can filter them in python? I swear I read somewhere in the documentation that it is supposed to do that...

Benyamin Jafari
  • 27,880
  • 26
  • 135
  • 150
Timmmm
  • 88,195
  • 71
  • 364
  • 509
  • possible duplicate of [Filter on prefetch\_related in Django](http://stackoverflow.com/questions/10915319/filter-on-prefetch-related-in-django) – akaihola Nov 22 '12 at 15:16

2 Answers2

198

In Django 1.6 and earlier, it is not possible to avoid the extra queries. The prefetch_related call effectively caches the results of a.photoset.all() for every album in the queryset. However, a.photoset.filter(format=1) is a different queryset, so you will generate an extra query for every album.

This is explained in the prefetch_related docs. The filter(format=1) is equivalent to filter(spicy=True).

Note that you could reduce the number or queries by filtering the photos in python instead:

someAlbums = PhotoAlbum.objects.filter(author="Davey Jones").prefetch_related("photo_set")
for a in someAlbums:
    somePhotos = [p for p in a.photo_set.all() if p.format == 1]

In Django 1.7, there is a Prefetch() object that allows you to control the behaviour of prefetch_related.

from django.db.models import Prefetch

someAlbums = PhotoAlbum.objects.filter(author="Davey Jones").prefetch_related(
    Prefetch(
        "photo_set",
        queryset=Photo.objects.filter(format=1),
        to_attr="some_photos"
    )
)
for a in someAlbums:
    somePhotos = a.some_photos

For more examples of how to use the Prefetch object, see the prefetch_related docs.

djvg
  • 11,722
  • 5
  • 72
  • 103
Alasdair
  • 298,606
  • 55
  • 578
  • 516
11

From the docs:

...as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query. So, if you write the following:

pizzas = Pizza.objects.prefetch_related('toppings') [list(pizza.toppings.filter(spicy=True)) for pizza in pizzas]

...then the fact that pizza.toppings.all() has been prefetched will not help you - in fact it hurts performance, since you have done a database query that you haven't used. So use this feature with caution!

In your case, "a.photo_set.filter(format=1)" is treated like a fresh query.

In addition, "photo_set" is a reverse lookup - implemented via a different manager altogether.

Community
  • 1
  • 1
Ngure Nyaga
  • 2,989
  • 1
  • 20
  • 30
  • `photo_set` can be prefetched, as well, with `.prefetch_related('photo_set')`. But order matters, as you have explained. – Risadinha Jul 03 '19 at 10:59