9

The Problem

I'm trying to use the Django ORM to do the equivalent of a SQL NOT IN clause, providing a list of IDs in a subselect to bring back a set of records from the logging table. I can't figure out if this is possible.

The Model

class JobLog(models.Model):
    job_number = models.BigIntegerField(blank=True, null=True)
    name = models.TextField(blank=True, null=True)
    username = models.TextField(blank=True, null=True)
    event = models.TextField(blank=True, null=True)
    time = models.DateTimeField(blank=True, null=True)

What I've Tried

My first attempt was to use exclude, but this does NOT to negate the entire Subquery, rather than the desired NOT IN:

query = (
    JobLog.objects.values(
        "username", "job_number", "name", "time",
    )
    .filter(time__gte=start, time__lte=end, event="delivered")
    .exclude(
        job_number__in=models.Subquery(
            JobLog.objects.values_list("job_number", flat=True).filter(
                time__gte=start, time__lte=end, event="finished",
            )
        )
    )
)

Unfortunately, this yields this SQL:

SELECT "view_job_log"."username", "view_job_log"."group", "view_job_log"."job_number", "view_job_log"."name", "view_job_log"."time"
FROM "view_job_log"
WHERE (
    "view_job_log"."event" = 'delivered'
    AND "view_job_log"."time" >= '2020-03-12T11:22:28.300590+00:00'::timestamptz
    AND "view_job_log"."time" <= '2020-03-13T11:22:28.300600+00:00'::timestamptz
    AND NOT (
        "view_job_log"."job_number" IN (
            SELECT U0."job_number"
            FROM "view_job_log" U0
            WHERE (
                U0."event" = 'finished' AND U0."time" >= '2020-03-12T11:22:28.300590+00:00'::timestamptz
                AND U0."time" <= '2020-03-13T11:22:28.300600+00:00'::timestamptz
            )
        )
        AND "view_job_log"."job_number" IS NOT NULL
    )
)

What I need is for the third AND clause to be AND "view_job_log"."job_number" NOT IN instead of the AND NOT (.

I've also tried doing the sub-select as it's own query first, with an exclude, as suggested here:

Django equivalent of SQL not in

However, this yields the same problematic result. Then I tried a Q object, which yields a similar query:

query = (
    JobLog.objects.values(
        "username", "subscriber_code", "job_number", "name", "time",
    )
    .filter(
        ~models.Q(job_number__in=models.Subquery(
            JobLog.objects.values_list("job_number", flat=True).filter(
                time__gte=start, time__lte=end, event="finished",
            )
        )),
        time__gte=start,
        time__lte=end,
        event="delivered",
    )
)

This attempt with the Q object yields the following SQL, again, without the NOT IN:

SELECT "view_job_log"."username", "view_job_log"."group", "view_job_log"."job_number", "view_job_log"."name", "view_job_log"."time"

FROM "view_job_log" WHERE (
    NOT (
        "view_job_log"."job_number" IN (
            SELECT U0."job_number"
            FROM "view_job_log" U0
            WHERE (
                U0."event" = 'finished'
                AND U0."time" >= '2020-03-12T11:33:28.098653+00:00'::timestamptz
                AND U0."time" <= '2020-03-13T11:33:28.098678+00:00'::timestamptz
            )
        )
        AND "view_job_log"."job_number" IS NOT NULL
    )
    AND "view_job_log"."event" = 'delivered'
    AND "view_job_log"."time" >= '2020-03-12T11:33:28.098653+00:00'::timestamptz
    AND "view_job_log"."time" <= '2020-03-13T11:33:28.098678+00:00'::timestamptz
)

Is there any way to get Django's ORM to do something equivalent to AND job_number NOT IN (12345, 12346, 12347)? Or am I going to have to drop to raw SQL to accomplish this?

Thanks in advance for reading this entire wall-of-text question. Explicit is better than implicit. :)

FlipperPA
  • 13,607
  • 4
  • 39
  • 71
  • 1
    Your model is missing the `event` field – Lotram Mar 13 '20 at 15:20
  • @Lotram Ah, I had deleted the non-relevant fields from the model and got a wee bit overzealous. Thank you for the catch! – FlipperPA Mar 13 '20 at 18:52
  • 1
    Not sure if this is expected but since `JobLog.job_number` is nullable your `NOT IN` won't pass as soon as your subquery returns a `NULL` (https://stackoverflow.com/questions/129077/null-values-inside-not-in-clause). That's exactly what the `"view_job_log"."job_number" IS NOT NULL` SQL added by the ORM tries to protect against when you do `exclude(field__in)` but you'll be on your own if you use your own `notin` lookup. – Simon Charette Mar 14 '20 at 22:20

3 Answers3

8

I think the easiest way to do this would be to define a custom lookup, similar to this one or the in lookup

from django.db.models.lookups import In as LookupIn

class NotIn(LookupIn):
    lookup_name = "notin"

    def get_rhs_op(self, connection, rhs):
        return "NOT IN %s" % rhs

Field.register_lookup(NotIn)

or

class NotIn(models.Lookup):
    lookup_name = "notin"

    def as_sql(self, compiler, connection):
        lhs, params = self.process_lhs(compiler, connection)
        rhs, rhs_params = self.process_rhs(compiler, connection)
        params.extend(rhs_params)

        return "%s NOT IN %s" % (lhs, rhs), params

then use it in your query:

query = (
    JobLog.objects.values(
        "username", "job_number", "name", "time",
    )
    .filter(time__gte=start, time__lte=end, event="delivered")
    .filter(
        job_number__notin=models.Subquery(
            JobLog.objects.values_list("job_number", flat=True).filter(
                time__gte=start, time__lte=end, event="finished",
            )
        )
    )
)

this generates the SQL:

SELECT
    "people_joblog"."username",
    "people_joblog"."job_number",
    "people_joblog"."name",
    "people_joblog"."time"
FROM
    "people_joblog"
WHERE ("people_joblog"."event" = delivered
    AND "people_joblog"."time" >= 2020 - 03 - 13 15:24:34.691222 + 00:00
    AND "people_joblog"."time" <= 2020 - 03 - 13 15:24:41.678069 + 00:00
    AND "people_joblog"."job_number" NOT IN (
        SELECT
            U0. "job_number"
        FROM
            "people_joblog" U0
        WHERE (U0. "event" = finished
            AND U0. "time" >= 2020 - 03 - 13 15:24:34.691222 + 00:00
            AND U0. "time" <= 2020 - 03 - 13 15:24:41.678069 + 00:00)))
Lotram
  • 729
  • 6
  • 17
  • Thank you, @Lotram! Works like a charm. I'm pondering a `notin` PR to Django. :) – FlipperPA Mar 13 '20 at 19:02
  • @Lotram, this is good, but it does not function as intended. You will get different SQL as opposed to ~Q, and you will get wrong results when for example performing a `job_number__notin=[]`. – coredumped0x Dec 31 '21 at 12:18
  • @MurphyAdam the question was indeed about not generating the same SQL, but rather introduce a `NOT IN` clause in the generated SQL, so that's what my answer does. I haven't tested every edge case, so it might not always work as intended. If you want to be safer, you can use @simon-charrette 's solution, as a django core developer, he probably knows best – Lotram Jan 04 '22 at 11:31
1

You can likely achieve the same results by using an Exists and special casing NULLs.

.filter(
   ~Exists(
       JobLog.objects.filter(
           Q(jobnumber=None) | Q(jobnumber=OuterRef('jobnumber')),
           time__gte=start,
           time__lte=end,
           event='finished',
       )
   )
)
Simon Charette
  • 5,009
  • 1
  • 25
  • 33
  • 1
    I appreciate the help, but isn't that quite a bit of gymnastics for what should be a fairly straightforward ORM operation? – FlipperPA Mar 14 '20 at 22:47
0

Can you try this:

JobLog.objects.filter(time__gte=start, time__lte=end, event="delivered").exclude(time__gte=start, event='finished').exclude(time__lte=end, event='finished')
Yellowduck
  • 482
  • 2
  • 4