19

I have a database table named 'student' in which there is one column named 'marks'. I want the student record with highest marks in Maths. There is a simple solution to it using order_by()[0]:

Student.objects.filter(subject='Maths').order_by('-marks')[0]

But this sorts the table and then fetches me the first record. If my table is huge, this is redundant as I need only the max record. Is there any way to just get the largest value without sorting?

I want the whole object, not just the max value.

Thanks Anuj

Mariusz Jamro
  • 30,615
  • 24
  • 120
  • 162
Anuj
  • 1,203
  • 3
  • 13
  • 20

5 Answers5

30

The SQL required would be something like this:

SELECT *
FROM STUDENT
WHERE marks = (SELECT MAX(marks) FROM STUDENT)

To do this via Django, you can use the aggregation API.

max_marks = Student.objects.filter(
    subject='Maths'
).aggregate(maxmarks=Max('marks'))['maxmarks']
Student.objects.filter(subject='Maths', marks=max_marks)

Unfortunately, this query is actually two queries. The max mark aggregation is executed, the result pulled into python, then passed to the second query. There's (surprisingly) no way to pass a queryset that's just an aggregation without a grouping, even though it should be possible to do. I'm going to open a ticket to see how that might be fixed.

Edit:

It is possible to do this with a single query, but it's not very obvious. I haven't seen this method elsewhere.

from django.db.models import Value

max_marks = (
    Student.objects
           .filter(subject='Maths')
           .annotate(common=Value(1))
           .values('common')
           .annotate(max_marks=Max('marks'))
           .values('max_marks')
)

Student.objects.filter(subject='Maths', marks=max_marks)

If you print this query in the shell you get:

SELECT 
       "scratch_student"."id", 
       "scratch_student"."name", 
       "scratch_student"."subject", 
       "scratch_student"."marks" 
  FROM "scratch_student" 
 WHERE ( 
       "scratch_student"."subject" = Maths 
   AND "scratch_student"."marks" = (
       SELECT 
              MAX(U0."marks") AS "max_marks" 
         FROM "scratch_student" U0 
        WHERE U0."subject" = Maths))

Tested on Django 1.11 (currently in alpha). This works by grouping the annotation by the constant 1, which every row will group into. We then strip this grouping column from the select list (the second values(). Django (now) knows enough to determine that the grouping is redundant, and eliminates it. Leaving a single query with the exact SQL we needed.

yofee
  • 1,287
  • 12
  • 25
Josh Smeaton
  • 47,939
  • 24
  • 129
  • 164
  • @ChadVernon, they are called F() Expressions. They allow you to use the value of another column. https://docs.djangoproject.com/en/dev/topics/db/queries/#query-expressions – Josh Smeaton Jan 16 '13 at 21:08
  • 2
    This doesn't work. The resulting SQL is something like: `SELECT *, MAX("mark") AS "max_mark" FROM STUDENT HAVING "STUDENT"."mark" = (MAX("STUDENT"."marks"))`, which simply selects all students. – Steffen Nov 18 '16 at 19:55
  • You're right, I should have tested this before writing an answer. Updating with a working answer. – Josh Smeaton Dec 08 '16 at 23:18
2

If you meant that you need one record with the highest mark in Maths, I think the use of SQL LIMIT is more obvious and preferable:

Student.objects.filter(subject='Maths').order_by('-marks')[:1].get()

The SQL will be like this:

SELECT * FROM student WHERE subject = 'Maths' ORDER BY marks DESC LIMIT 1

But keep in mind that there may be several students with the highest mark and you will get a random one.

dtatarkin
  • 1,201
  • 8
  • 6
0

With a naive database table, there is theoretically no possible way the database can retrieve the max value for you without first sorting. Just think about it, how can the database know which is the max value unless it looked at every single row?

Of course, that's with a very naive setup. Luckily you have two options available:

  1. use an index. If you create an index on that column, sorting can usually take advantage of the index - saving you a full table scan.

  2. normalize (aka precompute). Create another table somewhere that stores the max value, and make sure you check/update it every time a Student object is added/modified/deleted.

Without knowing more of the requirements, I strongly suggest using the index.

Check out: https://docs.djangoproject.com/en/dev/ref/models/fields/#db-index

shu zOMG chen
  • 436
  • 3
  • 8
  • This question seems to be more about the Django ORM and not really about database or table structures, which your answer is oriented towards. – TheCatParty Oct 14 '15 at 19:04
  • 1
    Theoretically, finding the max is a o(n) operation - you just have to visit every element once which is different from sorting. For sorting you need to make o(nlogn) comparisons. – abhaga Nov 23 '15 at 10:37
  • @abhaga The primary goal of databases, in my opinion, is to allow fast data access. By creating an index, the database is kept sorted so that accessing the maximum (or minimum, or offset) is an O(1) operation. – Rushy Panchal Aug 08 '16 at 14:58
  • 2
    @rushy I agree using an index is a useful solution. I was referring to the first line of the answer that says there is no way to find maximum without sorting first. Also index is more like a random access key which quickly tells you where to look. It does not keep the table sorted. – abhaga Sep 01 '16 at 04:15
0
from django.db.models import Max

temp = Student.objects.filter(subject='Math').aggregate(Max('marks'))
record = Student.objects.filter(Q(subject='Math') & Q(subject=temp['marks__max']))
Cagatay Barin
  • 3,428
  • 2
  • 24
  • 43
-1

This question can be helpfull to you: How to do SELECT MAX in Django?

Just use aggregation.

from django.db.models import Max
Student.objects.filter(subject='Math').aggregate(Max('marks'))

Not tested, but should work. :)

Community
  • 1
  • 1
zeroos
  • 2,094
  • 3
  • 17
  • 24
  • 9
    OP wants the full row, not just the max value, *"I want the student record with highest marks in Maths"* - OP – Munim Munna Dec 24 '18 at 14:05