4

I have a really strange problem with Django 1.4.4.

I have this model :

class LogQuarter(models.Model):
  timestamp = models.DateTimeField()
  domain = models.CharField(max_length=253)
  attempts = models.IntegerField()
  success = models.IntegerField()
  queue = models.IntegerField()
  ...

I need to gather the first 20 domains with the higher sent property. The sent property is attempts - queue.

This is my request:

obj = LogQuarter.objects\
      .aggregate(Sum(F('attempts')-F('queue')))\
      .values('domain')\
      .filter(**kwargs)\
      .order_by('-sent')[:20]

I tried with extra too and it isn't working.

It's really basic SQL, I am surprised that Django can't do this.

Did someone has a solution ?

SIkwan
  • 389
  • 1
  • 2
  • 14

2 Answers2

4

You can actually do this via subclassing some of the aggregation functionality. This requires digging in to the code to really understand, but here's what I coded up to do something similar for MAX and MIN. (Note: this code is based of Django 1.4 / MySQL).

Start by subclassing the underlying aggregation class and overriding the as_sql method. This method writes the actual SQL to the database query. We have to make sure to quote the field that gets passed in correctly and associate it with the proper table name.

from django.db.models.sql import aggregates
class SqlCalculatedSum(aggregates.Aggregate):
  sql_function = 'SUM'
  sql_template = '%(function)s(%(field)s - %(other_field)s)'

  def as_sql(self, qn, connection):
    # self.col is currently a tuple, where the first item is the table name and
    # the second item is the primary column name. Assuming our calculation is
    # on two fields in the same table, we can use that to our advantage. qn is
    # underlying DB quoting object and quotes things appropriately. The column
    # entry in the self.extra var is the actual database column name for the
    # secondary column.
    self.extra['other_field'] = '.'.join(
        [qn(c) for c in (self.col[0], self.extra['column'])])
    return super(SqlCalculatedSum, self).as_sql(qn, connection)

Next, subclass the general model aggregation class and override the add_to_query method. This method is what determines how the aggregate gets added to the underlying query object. We want to be able to pass in the field name (e.g. queue) but get the corresponding DB column name (in case it is something different).

from django.db import models
class CalculatedSum(models.Aggregate):
  name = SqlCalculatedSum

  def add_to_query(self, query, alias, col, source, is_summary):
    # Utilize the fact that self.extra is set to all of the extra kwargs passed
    # in on initialization. We want to get the corresponding database column
    # name for whatever field we pass in to the "variable" kwarg.
    self.extra['column'] = query.model._meta.get_field(
        self.extra['variable']).db_column
    query.aggregates[alias] = self.name(
        col, source=source, is_summary=is_summary, **self.extra)

You can then use your new class in an annotation like this:

queryset.annotate(calc_attempts=CalculatedSum('attempts', variable='queue'))

Assuming your attempts and queue fields have those same db column names, this should generate SQL similar to the following:

SELECT SUM(`LogQuarter`.`attempts` - `LogQuarter`.`queue`) AS calc_attempts

And there you go.

Chad
  • 1,794
  • 1
  • 17
  • 30
0

I am not sure if you can do this Sum(F('attempts')-F('queue')). It should throw an error in the first place. I guess, easier approach would be to use extra.

result = LogQuarter.objects.extra(select={'sent':'(attempts-queue)'}, order_by=['-sent'])[:20]
Raunak Agarwal
  • 7,117
  • 6
  • 38
  • 62
  • The problem with this solution is that I don't have a group_by on the domains. I gather the first 20 rows, but not aggregated. – SIkwan Dec 17 '12 at 17:33