I'm writing a complex django data querier and to speed up returns, I'm using values() along with filter() and aggregate, and am having some issues with duplicate results.
Picture a models.py
like so:
class Person(models.Model):
name= CharField()
class Question(models.Model):
title = CharField()
date_asked = DateField()
asker = ForeignKey(person)
What I'm trying to do is query django to get the name of a person and the title of their most recent question, using the Person
queryset and using values()
.
If we have the following sample data:
Person | Title | Date
----------------------------------------------
Jack | Where can I get water? | 2011-01-04
Jack | How to climb hill? | 2012-02-05
Jill | How to fix head injury? | 2014-03-06
I can get most of the way with this, like so:
List of peoples names name and date of most recent question:
Person.objects.values('name','most_recent')\\
.annotate('most_recent'=Max('question__date_asked'))
Person | most_recent
--------------------
Jack | 2012-02-05
Jill | 2014-03-06
List of peoples names name and all their questions and their titles:
Person.objects.values('name','question__title','question__date_asked')
Person | Title | Date
----------------------------------------------
Jack | Where can I get water? | 2011-01-04
Jack | How to climb hill? | 2012-02-05
Jill | How to fix head injury? | 2014-03-06
But when I try and put them together:
Person.objects.values('name','question__title','most_recent')\\
.annotate('most_recent'=Max('question__date_asked'))
.filt
Person | Title | most_recent
----------------------------------------------
Jack | Where can I get water? | 2011-01-04
Jack | How to climb hill? | 2012-02-05
Jill | How to fix head injury? | 2014-03-06
Even using an F() expression doesn't fix things:
Person.objects.values('name','question__title','most_recent')\\
.annotate('most_recent'=Max('question__date_asked'))
.filter('question__date_asked'=F('most_recent'))
Person | Title | most_recent
----------------------------------------------
Jack | Where can I get water? | 2011-01-04
Jack | How to climb hill? | 2012-02-05
Jill | How to fix head injury? | 2014-03-06
Note: In the above table, the "Max" date is given for each relation, not for each Person.
What I need is:
Person | Title | most_recent
----------------------------------------------
Jack | How to climb hill? | 2012-02-05
Jill | How to fix head injury? | 2014-03-06
Something in the ordering of the statements, and joins means that when using filters, aggregates and values at the same time means that the join happens before the SQL USING statement which should be restricting the return rows.
Any ideas on how I can perform this query?
Update:
The relevant SQL query looks like this:
SELECT "example_person"."full_name", "example_question"."title",
MAX("example_question"."date_asked") AS "max___example_question__date_asked"
FROM "example_person"
LEFT OUTER JOIN
"example_question" ON ( "example_person"."id" = "example_question"."person_id" )
INNER JOIN
"example_question" T3 ON ( "example_person"."id" = T3."person_id" )
GROUP BY
"example_person"."full_name", T3."start_date",
"example_person"."id", "example_question"."title"
HAVING
T3."date_asked" = (MAX("example_person"."date_asked"))
This issue is with djangos over specificity with the GROUP BY
statement. If I run ./manage.py dbshell
and run the query above, I get the redundent results, but if I restrict it to GROUP BY "example_person"."full_name"
with no other groupings I get the right results.
Is there a way to restrict django's GROUP BY
or some kind of monkey patch just to restrict it a little bit?