13

I want to pass a query in Django to my PostgreSQL database. When I filter my query using a large array of ids, the query is very slow and goes up to 70s.

After looking for an answer I saw this post which gives a solution to my problem, simply change the ARRAY [ids] in IN statement by VALUES (id1), (id2), ....

I tested the solution with a raw query in pgadmin, the query goes from 70s to 300ms...

How can I do the same command (i.e. not using an array of ids but a query with VALUES) in Django?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
poolet
  • 371
  • 4
  • 8

4 Answers4

6

I found a solution building on @erwin-brandstetter answer using a custom lookup

from django.db.models import Lookup
from django.db.models.fields import Field

@Field.register_lookup
class EfficientInLookup(Lookup):

    lookup_name = "ineff"

    def as_sql(self, compiler, connection):
        lhs, lhs_params = self.process_lhs(compiler, connection)
        rhs, rhs_params = self.process_rhs(compiler, connection)
        params = lhs_params + rhs_params
        return "%s IN (SELECT unnest(%s))" % (lhs, rhs), params

This allows to filter like this:

MyModel.objects.filter(id__ineff=<list-of-values>)
ilse2005
  • 11,189
  • 5
  • 51
  • 75
  • 1
    When passing a list to the standard `__in` lookup, Django 2.1 translates it in `IN (x, y, z)`. Also, the post quoted in the question has been updated with "need only be applied on 9.0. As of Postgres 9.3, ANY(ARRAY[...]) is performing properly again." So this might only be relevant for old versions of Django *and* Postgres. – Endre Both Mar 20 '19 at 21:43
  • 1
    But it's still nice to see working custom lookups that can serve as templates for other efforts! – Endre Both Mar 20 '19 at 21:47
  • 1
    Well, I think this is still a problem for large lists. For a list with 2k entries I could reduce the query time with the lookup above from 2s to 0,05s! And I'm using postgres 10 and django 2.1 – ilse2005 Mar 22 '19 at 09:04
  • That's definitely significant. What was the ORM call that resulted in the SQL query with ARRAY? – Endre Both Mar 22 '19 at 09:52
  • I cannot seem to replicate this. I got 15 ms for a 2k list and 250 ms for a 50k (!) list. – Endre Both Mar 23 '19 at 23:01
  • I think the problem is only if you have a field that is not a db index. – ilse2005 Mar 24 '19 at 02:05
  • Ah, I would have thought that without an index you're toast anyway. The DB seems to create an indexed temp table for the quicker approach. It would be interesting to see the `EXPLAIN`s for both versions. – Endre Both Mar 24 '19 at 20:34
3

The trick is to transform the array to a set somehow.

Instead of (this form is only good for a short array):

SELECT *
FROM   tbl t
WHERE  t.tbl_id = ANY($1);
-- WHERE  t.tbl_id IN($1);  -- equivalent

$1 being the array parameter.

You can still pass an array like you had it, but unnest and join. Like:

SELECT *
FROM   tbl t
JOIN   unnest($1) arr(id) ON arr.id = t.tbl_id;

Or you can keep your query, too, but replace the array with a subquery unnesting it:

SELECT * FROM tbl t
WHERE  t.tbl_id = ANY (SELECT unnest($1));

Or:

SELECT * FROM tbl t
WHERE  t.tbl_id IN    (SELECT unnest($1));

Same effect for performance as passing a set with a VALUES expression. But passing the array is typically much simpler.

Detailed explanation:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

Is this an example of the first thing you're asking?

relation_list = list(ModelA.objects.filter(id__gt=100))
obj_query = ModelB.objects.filter(a_relation__in=relation_list)

That would be an "IN" command because you're first evaluating relation_list by casting it to a list, and then using it in your second query.

If instead you do the exact same thing, Django will only make one query, and do SQL optimization for you. So it should be more efficient that way.

You can always see the SQL command you'll be executing with obj_query.query if you're curious what's happening under the hood.

Hope that answers the question, sorry if it doesn't.

Sam Bobel
  • 1,784
  • 1
  • 15
  • 26
0

I had lots of trouble to make the custom lookup 'ineff' work. I may have solved it, but would love some validation from Django and Postgres experts.

1) Using it 'directly' on a ForeignKey field (ModelB)

ModelA.objects.filter(ModelB__ineff=queryset_ModelB)

Throws the following exception: "Related Field got invalid lookup: ineff"

ForeignKey fields cannot be used with custom lookups.

A similar issue is reported here: Custom lookup is not being registered in Django

2) Using it 'indirectly' on the pk field of related model (ModelB.id)

ModelA.objects.filter(ModelB__id__ineff=queryset_ModelB.values_list('id', flat=True))

Throws the following exception: "can only concatenate list (not "tuple") to list"

Looking at Django Traceback, I noticed that rhs_params is a tuple. Yet we try to add it to lhs_params (a list) in our custom lookup.

Hence I changed:

params = lhs_params + rhs_params

into:

params = lhs_params + list(rhs_params)

3) I then got a Postgres error (at least I had passed Django ORM) "function unnest(uuid) does not exist" "HINT: No function matches the given name and argument types. You might need to add explicit type casts."

I apparently solved it by changing the sql:

from:

return "%s IN (SELECT unnest(%s))" % (lhs, rhs), params

to:

return "%s IN (SELECT unnest(ARRAY(%s)))" % (lhs, rhs), params    

Hence my final as_sql method looks like this:

def as_sql(self, compiler, connection):
    lhs, lhs_params = self.process_lhs(compiler, connection)
    rhs, rhs_params = self.process_rhs(compiler, connection)
    params = lhs_params + list(rhs_params)
    return "%s IN (SELECT unnest(ARRAY(%s)))" % (lhs, rhs), params

It seems to work, and is indeed faster than in__ (tested with EXPLAIN ANALYZE in Postgres). But I would love to have some validation from experts, perhaps Erwin Brandstetter? Thanks for your input.

edyas
  • 1
  • 1
  • 2