2

I'm writing a complex django data querier and to speed up returns, I'm using values() along with filter() and aggregate, and am having some issues with duplicate results.

Picture a models.py like so:

class Person(models.Model):
    name= CharField()

class Question(models.Model):
    title = CharField()
    date_asked = DateField()
    asker = ForeignKey(person)

What I'm trying to do is query django to get the name of a person and the title of their most recent question, using the Person queryset and using values().

If we have the following sample data:

Person | Title                    | Date
----------------------------------------------
Jack   | Where can I get water?   | 2011-01-04
Jack   | How to climb hill?       | 2012-02-05
Jill   | How to fix head injury?  | 2014-03-06

I can get most of the way with this, like so:

List of peoples names name and date of most recent question:

Person.objects.values('name','most_recent')\\
              .annotate('most_recent'=Max('question__date_asked'))

Person | most_recent
--------------------
Jack   | 2012-02-05
Jill   | 2014-03-06

List of peoples names name and all their questions and their titles:

Person.objects.values('name','question__title','question__date_asked')

Person | Title                    | Date
----------------------------------------------
Jack   | Where can I get water?   | 2011-01-04
Jack   | How to climb hill?       | 2012-02-05
Jill   | How to fix head injury?  | 2014-03-06

But when I try and put them together:

Person.objects.values('name','question__title','most_recent')\\
              .annotate('most_recent'=Max('question__date_asked'))
              .filt

Person | Title                    | most_recent
----------------------------------------------
Jack   | Where can I get water?   | 2011-01-04
Jack   | How to climb hill?       | 2012-02-05
Jill   | How to fix head injury?  | 2014-03-06

Even using an F() expression doesn't fix things:

Person.objects.values('name','question__title','most_recent')\\
              .annotate('most_recent'=Max('question__date_asked'))
              .filter('question__date_asked'=F('most_recent'))

Person | Title                    | most_recent
----------------------------------------------
Jack   | Where can I get water?   | 2011-01-04
Jack   | How to climb hill?       | 2012-02-05
Jill   | How to fix head injury?  | 2014-03-06

Note: In the above table, the "Max" date is given for each relation, not for each Person.

What I need is:

Person | Title                    | most_recent
----------------------------------------------
Jack   | How to climb hill?       | 2012-02-05
Jill   | How to fix head injury?  | 2014-03-06

Something in the ordering of the statements, and joins means that when using filters, aggregates and values at the same time means that the join happens before the SQL USING statement which should be restricting the return rows.

Any ideas on how I can perform this query?


Update:

The relevant SQL query looks like this:

SELECT "example_person"."full_name", "example_question"."title",
       MAX("example_question"."date_asked") AS "max___example_question__date_asked"
FROM "example_person"
  LEFT OUTER JOIN
     "example_question" ON ( "example_person"."id" = "example_question"."person_id" )
  INNER JOIN
     "example_question" T3 ON ( "example_person"."id" = T3."person_id" )
GROUP BY
     "example_person"."full_name", T3."start_date",
     "example_person"."id", "example_question"."title"
HAVING
     T3."date_asked" = (MAX("example_person"."date_asked"))

This issue is with djangos over specificity with the GROUP BY statement. If I run ./manage.py dbshell and run the query above, I get the redundent results, but if I restrict it to GROUP BY "example_person"."full_name" with no other groupings I get the right results.

Is there a way to restrict django's GROUP BY or some kind of monkey patch just to restrict it a little bit?

2 Answers2

1

Depending on your backend you should be able to accomplish this with an order_by and a distinct like this:

Question.objects.order_by('asker__name', '-date').distinct('asker__name')

This should sort your objects by the asking person's name and date descending, then take the first question for each asking person, which will be the latest one. You didn't mention the backend you are using, so if you are using something like SQLite which doesn't support distinct you may have to do this another way.

Nat Dempkowski
  • 2,331
  • 1
  • 19
  • 36
  • For reasons that are too complex to go into, I *need* to use the `Person` queryset, not the `Question` queryset –  Jul 27 '15 at 21:07
0

This is a partial answer that I'll update, but I've found a way.

Django doesn't like you playing with the GROUP BY statements and they are buried deep. Waaaay deep.

However with this (Django 1.7 only) monkey patch, you can override how grouping is done. In this below example we catch the grouping django thinks you should have and then cuts it back if and only if this query is uses an aggregation (the having_group_by argument is only populated when there is an aggregation.

_get_grouping  = SQLCompiler.get_grouping
def custom_get_grouping(compiler,having_group_by, ordering_group_by):
    fields,thing = _get_grouping(compiler,having_group_by, ordering_group_by)
    if having_group_by:
        fields = fields[0:1]+[".".join(f) for f in having_group_by]
    return fields,thing

SQLCompiler.get_grouping = custom_get_grouping

Hopefully a better way will come soon...