Django: How to count a queryset and return a slice without hitting DB twice?

Question

I have this part of the code in my API which recently has become a somewhat bottleneck:

total = results.count()
if request.GET.has_key('offset'):
    offset = int(request.GET.get('offset').strip())
    results = results.order_by('name')[100*offset:100*(offset+1)]
people = list(results)

Note that results is the queryset of all the people and offset is a param used for pagination.

Here I can see, when I print connection.queries, that my database getting hit twice by .count() and list(results). The reason why .count() has to be at the top because I need to the length of all the people (Not 100.) Is there a way to get around this?

[MySQL can do that](https://stackoverflow.com/a/2439870/1248008), but I don’t know how to make use of that using django and I am not sure whether other databases support this. — Jonas Schäfer, Jun 04 '15 at 20:22
@JonasWielicki It's debatable whether this will even speed up things much; see: https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/ – a better way to address the issue here might be to first understand why those queries are taking too long. Appropriate indices can get you a long way. — Thomas Orozco, Jun 04 '15 at 20:24
OP, have you benchmarked why those queries are taking too long? Trying to make one query when you need two might seem like the obvious solution, but that doesn't mean it's the only (or even the best) one. **When doing performance optimization you should trust benchmarks first, not intuition.** — Thomas Orozco, Jun 04 '15 at 20:25
@ThomasOrozco Thanks for the pointer. Haven’t been doing any performance critical DB stuff in years. — Jonas Schäfer, Jun 04 '15 at 20:26
Going off of what @ThomasOrozco said, https://docs.djangoproject.com/en/1.8/topics/db/optimization/#don-t-order-results-if-you-don-t-care - TLDR: add an index to the `name` if you will be using it for `order_by` — Michael B, Jun 05 '15 at 01:12

MrAlexBailey · Accepted Answer · 2015-06-04T20:35:09.747

-1

Maybe something like this?:

allpeople = list(results.order_by('name'))
total = len(allpeople)
if request.GET.has_key('offset'):
    offset = int(request.GET.get('offset').strip())
    results = allpeople[100*offset:100*(offset+1)]
people = results

Keep in mind that with people = results is going to fail if if request.GET....: doesn't fire.

edited Jun 04 '15 at 20:35

answered Jun 04 '15 at 20:29

MrAlexBailey

5,219
19
30

1

You could initialize results like `results = allpeople = list(results...` in your first line to remove the need for your final caveat. – Two-Bit Alchemist Jun 04 '15 at 20:40
9

I disagree with this answer because we do not know the number of results that are being returned. Wrapping a QuerySet in `list()` causes it to be evaluated, and brings all results into memory. If you have several million results, this can be catastrophic. This happened to me recently, so I am speaking from experience. – Michael B Jun 05 '15 at 01:17
You're correct. This is in fact slower as I did benchmark with million records. However, it totally answers the question of avoiding to hit the DB twice... Turned out @ThomasOrozco was right that fewer queries may not always be faster. Btw, the `name` index has always been there since the beginning. – Benz Jun 05 '15 at 19:46

Django: How to count a queryset and return a slice without hitting DB twice?

1 Answers1