1

Backstory (can skip): I have a database with records of events. Events for each label occurred around a certain period in the year. I would like to find when in the year, more or less, that group of events occurred. Therefore I planned to calculate the average timestamp per group, and do this efficiently with postgres, instead of fetching all the timestamps and calculating locally.

The question: I'm trying to average a timestamp without time zone postgres field with django's excellent ORM like so:

from django.db.models import Avg
ModelName.objects.filter(a_field='some value').aggregate(Avg('time'))

However I'm getting:

function avg(timestamp without time zone) does not exist
LINE 1: SELECT AVG("model_name"."time") AS "time__avg" FROM "m...
           ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

Is there a way to do this with django's ORM?
If not, how then do I workaround?

Jonathan Livni
  • 101,334
  • 104
  • 266
  • 359
  • 1
    The Postgres function `avg()` does not accept `timestamp`'s as the error indicates. – Adrian Klaver May 30 '22 at 21:40
  • This sounds like an XY problem. Kindly explain the issue you are trying to solve rather than how you are trying to solve it. Exactly what does avg(some timestamp) actually mean, what is the full query. – Belayer May 30 '22 at 22:36
  • @Belayer - added, even though I'm not sure the backstory helps much with the actual question – Jonathan Livni May 31 '22 at 05:22
  • @AdrianKlaver yes, I figured as much. curious if this is a missing feature in django, or an intentional feature and if so why. Also, how would you go about doing it? – Jonathan Livni May 31 '22 at 05:23
  • This really has nothing to do with Django other then `aggregate(Avg('time')` is translated into `AVG("model_name"."time")`. Not even sure what an average of timestamps would be? Postgres `avg()` will work on `interval`s, so if you can get the times into intervals then you could do this. – Adrian Klaver May 31 '22 at 14:47

1 Answers1

1

I had a similar problem, where I wanted to find the average time taken to vote for a particular item. But postgres wouldn't allow taking the average of datetimes. Doing so would result in the following error:

django.db.utils.ProgrammingError: function avg(timestamp with time zone) does not exist
LINE 1: SELECT "votes_item"."name", AVG("votes_vote"."datetime") AS ...
                                    ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

To make this simpler consider the following tables, where the Vote has foreign key relation to Item:

The Item table:

id: pK name: char datetime(time at which the item was inserted)
1 Apple 22-06-23 11:25:33
2 Orange 22-06-22 01:22:18

The Vote table:

id: pK user: Fk (user who voted for the item) item: Fk (the item the user voted) vote: (1 for positive vote and -1 negative vote) datetime (time at which the item was voted)
1 1 1 1 2022-06-22 11:26:18
2 3 1 1 2022-06-21 12:26:36
3 2 1 1 2022-06-26 01:20:59

I wanted to know the average time at which users voted for each item. For eg: all the avg time taken by users to vote Apple (i.e annotate)

Since postgres avg function doesn't directly take in datetime, first convert it to seconds then take the average and convert it back to datetime.

To make things simpler create two class as shown below.

from django.db import models

class Epoch(models.expressions.Func):

    template = 'EXTRACT(epoch FROM %(expressions)s)::FLOAT'
    output_field = models.FloatField()


class DateTimeFromFloat(models.expressions.Func):

    template = 'To_TIMESTAMP(%(expressions)s)::TIMESTAMP at time zone \'UTC\''
    output_field = models.DateTimeField()

read more about Func in this excellent answer

Now I wanted to get the average time at which each item was voted positive

So I use

Item.objects.filter(vote__vote=1).annotate(avg_time=DateTimeFromFloat(Avg(Epoch('vote__datetime')))).values('avg_time', 'name')

important part:

annotate(avg_time=DateTimeFromFloat(Avg(Epoch('vote__datetime'))))

output

<QuerySet [{'name': 'Apple', 'avg_time': datetime.datetime(2022, 6, 23, 8, 24, 37, 666667, tzinfo=datetime.timezone.utc)}]>

You can perform a similar operation using aggregate.

Art
  • 2,836
  • 4
  • 17
  • 34