I have a model like this (simplified):
class Events(models.Model):
reference_date = models.DateField()
event = models.IntegerField(db_index=True)
created_at = models.DateTimeField()
updated_at = models.DateTimeField()
class Meta:
unique_together = (('reference_date', 'event'),)
I can have the same event
with multiple reference dates, but I only have one event
per reference_date
. The table data looks something like this:
id | reference_date | event | created_at | updated_at |
---|---|---|---|---|
1 | 2022-01-01 | 12345 | 2022-03-05 18:18:03 | 2022-03-06 18:12:09 |
2 | 2022-01-02 | 12345 | 2022-03-08 08:05:11 | 2022-03-08 08:05:55 |
3 | 2022-01-08 | 12345 | 2022-06-15 18:18:12 | 2022-06-16 02:23:11 |
4 | 2022-01-01 | 98765 | 2022-01-11 07:55:25 | 2022-01-13 08:45:12 |
5 | 2022-01-02 | 98765 | 2022-06-22 10:25:08 | 2022-07-05 18:55:08 |
6 | 2022-01-09 | 45678 | 2022-02-19 12:55:07 | 2022-04-16 12:21:05 |
7 | 2022-01-10 | 45678 | 2022-03-05 11:23:45 | 2022-03-05 18:55:03 |
I need the latest record for each event
. But I need all the event
attributes, not just the max(reference_date)
I'm looking for this result:
[
{'id': 3, 'event': 12345, 'reference_date': '2022-01-08', 'created_at': '2022-06-15 18:18:12', 'updated_at': '2022-06-16 02:23:11'},
{'id': 5, 'event': 98765, 'reference_date': '2022-01-02', 'created_at': '2022-06-22 10:25:08', 'updated_at': '2022-07-05 18:55:08'},
{'id': 7, 'event': 45678, 'reference_date': '2022-01-10', 'created_at': '2022-03-05 11:23:45', 'updated_at': '2022-03-05 18:55:03'}
]
From a 'sql-perspective' I could get the results in multiple ways: SUBQUERIES, ROW_NUMBER, CORRELATED SUBQUERY etc. In this particular case for clarity reasons I prefer to use a join with itself using an aggregate inside the subquery.
If I'd write a raw query, I'd would do like this:
SELECT
e.*
FROM events as e
INNER JOIN (
SELECT
event,
max(reference_date) as reference_date
FROM events
GROUP BY event) AS b
ON b.reference_date = e.reference_date AND b.event = e.event
I think this query is very straightforward and has a good performance for the data volume in this table (hundreds of thousands of records)
I looked for several ways to build this type of query (Subquery, PrefetchRelated etc) but I couldn't find any proper way to translate this query into Django ORM syntax.
Is there a way using the 'django-orm way' without a convoluted ORM solution with the same reasonable performance?
ps: I need this query (or similar ones) for different database engines (PostgreSql, MySQL, MSQLServer ..)