0

I'm using distinct() QuerySet to get some data in Django.
My initial query was Point.objects.order_by('chron', 'pubdate').
The field chron in some cases is a duplicate so I changed the query to Point.objects.order_by('chron', 'pubdate').distinct('chron') in order to exclude duplicates. Now the problem is that all empty fields are considered duplicates.

To be accurate, the chron field contain integers (which behave similar to ids), in some cases it can be a duplicate, in some cases it can be NULL.

| chron |
|-------|
| 1     | I want this
| 2     | I want this
| 3     | I want this
| 3     |
| NULL  |
| 4     | I want this
| NULL  |

I want to exclude all the chron duplicates but not if they are duplicate of NULL. Thank you.

isar
  • 1,661
  • 1
  • 20
  • 39
  • check this. https://stackoverflow.com/questions/30084107/django-query-with-order-by-distinct-and-limit-on-postgresql. then do exclude to filter null result. – Windsooon Aug 23 '16 at 04:32

1 Answers1

2

Use two separate queries.

  • .distinct("chron").exclude(chron__isnull=True)

  • .filter() for only chron values where chron__isnull=True.

Although this seems pretty inefficient I believe (I will happily be corrected) that even any sensible vanilla SQL statement (eg. below) would require multiple table scans to join a result set of nulls and unique values.

SELECT *
FROM (
    SELECT chron
    FROM Point
    WHERE chron IS NOT NULL  # .exclude()
    GROUP BY chron  # .distinct()

    UNION ALL

    SELECT chron
    FROM Point
    WHERE chron IS NULL  # .include()
)
matthew.
  • 176
  • 5
  • why would you do `GROUP BY` instead of `SELECT DISTINCT`? – Sardorbek Imomaliev Aug 23 '16 at 05:10
  • Thanks, it worked with a little adjustment: `chron__isnull` instead of `chron__is_null`, I think because `chron` is an `IntegerField`. – isar Aug 23 '16 at 13:14
  • I'll update the answer with the corrected syntax. Make sure to mark you question as answered. – matthew. Aug 23 '16 at 14:39
  • @SardorbekImomaliev I looked into this since I originally did it out of habit. As per this answer here (http://stackoverflow.com/questions/581521/whats-faster-select-distinct-or-group-by-in-mysql) DISTINCT is better to use when grouping by a non-indexed column. – matthew. Aug 29 '16 at 22:22