0

I am using timescaledb which is basically just an extension for postgres. It comes with a SQL function called time_bucket. I want to use this function in combination with the ORM to generate a query as follows:

SELECT
  time_bucket('1 minute', time) AS tb,
  AVG(s0) 
FROM measurements
WHERE
  time >= to_timestamp(1) AND
  time <= to_timestamp(2)
GROUP BY tb
ORDER BY tb ASC;

models.py:

class Measurement(models.Model):

    device_id = models.IntegerField(primary_key=True)
    time = models.DateTimeField()
    s0 = models.FloatField(blank=True, null=True)
    s1 = models.FloatField(blank=True, null=True)

My try so far:

class TimeBucket(Func):

    function = 'time_bucket'
    template = '%(function)s(\'{bucket_width}\', %(expressions)s)'.format(bucket_width='1 minute')


(Measurement.objects
    .values('time')
    .annotate(tb=TimeBucket('time'))
    .annotate(s_desc=Avg('s0'))
    .filter(
        time__gte=datetime.fromtimestamp(start),
        time__lte=datetime.fromtimestamp(end))
    .order_by('tb')
)

Results in:

SELECT
  "measurements"."time",
  time_bucket('1 minute', "measurements"."time") AS "tb",
  (AVG("measurements"."s0")) AS "s_desc"
FROM "measurements"
WHERE (
  "measurements"."time" <= 2447-10-02 14:17:01+00:00 AND 
  "measurements"."time" >= 1970-01-01 00:00:01+00:00
)
GROUP BY "measurements"."time", time_bucket('1 minute', "measurements"."time")
ORDER BY "tb" ASC

As you see there are two ugly points left:

  • How could I use the alias tb in the GROUP BY instead of repeating it?
  • I only need to query time_bucket and s0. How to get rid of time without breaking the query?
Yannic Hamann
  • 4,655
  • 32
  • 50

1 Answers1

2

The un-truncated column time can be used only in filter before the first annotate(...) and never more.

The truncated time annotation must be used in .values() (e.g. .values('tb')) before any aggregation function.

qs = (
    Measurement.objects
    .filter(...)
    .annotate(tb=TimeBucket('time'))
    .values('tb')
    .annotate(s_desc=Avg('s0'))
    .order_by('tb')
)

About alias in GROUP BY: The alias can be generally used only in ORDER BY in all databases, but only specific database allow it in GROUP BY or WHERE (only MySQL and Postgresql). This is probably because the GROUP BY and WHERE clauses are evaluated before the SELECT clause. This can be bypassed by a subquery, but not useful at all. I'm sure that the standard query plan optimizer of every modern database driver reuses the auxiliary expressions from GROUP BY and it never evaluates the function repeatedly. The alias is useful if the SQL is written manually and read by humans, but not useful to implement it in Django ORM compiler by alias only for some backends.


EDIT: This solution works in Django >= 1.11 and tested also in Django 3.0.

(There was an incorrect information about Django versions. I don't remember the solution in Django <=1.10, but I'm sure that it was more complicated than in Django 1.11. Maybe there had to be written an additional final .values('tb', 's_desc'))

hynekcer
  • 14,942
  • 6
  • 61
  • 99
  • I know this is old, but I'm finding that this still includes `time` as part of the group by query. Why is this the case? – Rhubarb Jun 14 '20 at 23:08
  • 1
    @Rhubarrbb I fixed and extended the answer. The original simple answer was correct. The first old edit about Django versions was invalid. – hynekcer Jun 15 '20 at 10:30