13

Given this FruitBasket model,

class FruitBasket(Model):
    fruit = CharField(max_length=128)
    count = PositiveIntegerField()

And this sample data,

id      fruit        count
-----   ----------   -----
0       apple        10
1       banana       20
2       apple        5
3       banana       30

I want a django query that returns the following items:

[ (2, apple, 5), (3, banana, 30) ]

Essentially, grabbing the "latest" row per fruit (I've simplified the timestamp to rowid in this example.)

rrauenza
  • 6,285
  • 4
  • 32
  • 57
  • Seems related to http://stackoverflow.com/questions/612231/how-can-i-select-rows-with-maxcolumn-value-distinct-by-another-column-in-sql?rq=1 – rrauenza May 30 '14 at 17:49

4 Answers4

13

https://docs.djangoproject.com/en/dev/ref/models/querysets/#distinct

q = FruitBasket.objects.distinct('fruit')

will only work if you are using postgres.

On PostgreSQL only, you can pass positional arguments (*fields) in order to specify the names of fields to which the DISTINCT should apply. This translates to a SELECT DISTINCT ON SQL query. Here’s the difference. For a normal distinct() call, the database compares each field in each row when determining which rows are distinct. For a distinct() call with specified field names, the database will only compare the specified field names.

Additionally, you would have to specify an order_by and it couldn't be by the timestamp:

q = FruitBasket.objects.distinct('fruit').order_by('fruit')

When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.

For example, SELECT DISTINCT ON (a) gives you the first row for each value in column a. If you don’t specify an order, you’ll get some arbitrary row.

however, values might get you closer if you could kill the requirement: distinct/order_by having the same values in the same order.

q = (
    FruitBasket.objects
    .values('id', 'fruit', 'count')
    .distinct('fruit').order_by('-id')
)

realistically, sometimes its better to break out of the ORM

SELECT id, fruit, count 
FROM FruitBasket
GROUP BY fruit  
ORDER BY id DESC

bad query

so this query doesn't magic...

SELECT * FROM (SELECT id, fruit, count 
FROM FruitBasket
ORDER BY id DESC) t
GROUP BY t.fruit

better but still horrid

this ones better but kinda ugly.

optimize this on your own:

q = FruitBasket.objects.raw("""\
    SELECT * FROM 
    (
        SELECT id, fruit, count 
        FROM FruitBasket 
        ORDER BY id DESC
    ) t
    GROUP BY t.fruit
""")
dhill
  • 3,327
  • 2
  • 19
  • 18
Francis Yaconiello
  • 10,829
  • 2
  • 35
  • 54
  • Wasn't sure if you meant "q = FruitBasket.objects.values('id', 'fruit', 'count').distinct('fruit').order_by('-id')" should work, or just it is an ideal if it did. It doesn't. I may just refactor to keep a foreign key to the "latest" entry from each "Fruit." (In reality, 'Fruit' is a ForeignKey in my real world problem.) – rrauenza May 30 '14 at 23:04
  • 2
    I meant that it would be ideal if it did work, the only working solution is the one at the bottom. the top explains why the orm fails to do what you need. – Francis Yaconiello Jun 01 '14 at 01:39
3

You can try this:

FruitBasket.objects.order_by('fruit', '-count').distinct('fruit')

In my case it worked for Django 2.1

  • As mentioned in the documentation and the other answer, this will work only with PostgreSQL `On PostgreSQL only, you can pass positional arguments (*fields) in order to specify the names of fields to which the DISTINCT should apply` – The Godfather Mar 19 '19 at 12:59
1

As an alternative, if you have fixed (small) amount of possible distinct values, you can use several queries (not really optimal, but should work for small projects):

available_fruits = ['banana', 'apple'] # can be also an extra query to extract distinct values
fruits = [FruitBasket.objects.filter(fruit=x).latest('id') for x in available_fruits ]

In my case it was only 4 values, so I'm ok with making 4 simple and fast queries.

The Godfather
  • 4,235
  • 4
  • 39
  • 61
1

Subquery might help you out here, an example from the docs:

>>> from django.db.models import OuterRef, Subquery
>>> newest = Comment.objects.filter(post=OuterRef('pk')).order_by('-created_at')
>>> Post.objects.annotate(newest_commenter_email=Subquery(newest.values('email')[:1]))
elad silver
  • 9,222
  • 4
  • 43
  • 67