0

I want to return the last report of a given range of units. The last report will be identified by its time of creation. Therefore, the result would be a collection of last reports for a given range of units. I do not want to use a bunch of SELECT statements e.g.:

SELECT * FROM reports WHERE unit_id = 9999 ORDER BY time desc LIMIT 1
SELECT * FROM reports WHERE unit_id = 9998 ORDER BY time desc LIMIT 1
...

I initially tried this (but already knew it wouldn't work because it will only return 1 report):

'SELECT reports.* FROM reports INNER JOIN units ON reports.unit_id = units.id WHERE units.account_id IS NOT NULL AND units.account_id = 4 ORDER BY time desc LIMIT 1'

So I am looking for some kind of solution using subqueries or derived tables, but I can't just seem to figure out how to do it properly:

'SELECT reports.* FROM reports
WHERE id IN 
(
  SELECT id FROM reports
  INNER JOIN units ON reports.unit_id = units.id
  ORDER BY time desc
  LIMIT 1
) 

Any solution to do this with subqueries or derived tables?

JohnMerlino
  • 3,900
  • 4
  • 57
  • 89
  • have you explored the [HAVING](http://www.techonthenet.com/mysql/having.php) clause? – bobkingof12vs Mar 04 '14 at 22:02
  • I think this question may help - http://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group?rq=1 as it sounds familiar. – SamV Mar 04 '14 at 22:03

1 Answers1

2

The simple way to do this in Postgres uses distinct on:

select distinct on (unit_id) r.*
from reports r
order by unit_id, time desc;

This construct is specific to Postgres and databases that use its code base. It the expression distinct on (unit_id) says "I want to keep only one row for each unit_id". The row chosen is the first row encountered with that unit_id based on the order by clause.

EDIT:

Your original query would be, assuming that id increases along with the time field:

SELECT r.*
FROM reports r
WHERE id IN (SELECT max(id)
             FROM reports
             GROUP BY unit_id
            );

You might also try this as a not exists:

select r.*
from reports r
where not exists (select 1
                  from reports r2
                  where r2.unit_id = r.unit_id and
                        r2.time > r.time
                 );

I thought the distinct on would perform well. This last version (and maybe the previous) would really benefit from an index on reports(unit_id, time).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Is there a way to do this with a where clause like this: "SELECT DISTINCT ON (unit_id) r.* FROM reports r WHERE reports.unit_id IN (9988,9999) ORDER BY unit_id, time DESC". This gives me something like invalid reference to FROM-clause entry for table "reports" – JohnMerlino Mar 04 '14 at 23:04
  • @JohnMerlino I think you just failed to use the alias; `WHERE r.unit_id IN (...)` – Craig Ringer Mar 04 '14 at 23:22
  • I tried using this query, but it is running incredibly slow. After around 10 minutes, I just had to cancel the load. When I was using the SELECT statements, it took long but not nearly as long (only 7 or 8 seconds). – JohnMerlino Mar 05 '14 at 00:27
  • Any ideas why this query runs so slow? – JohnMerlino Mar 05 '14 at 05:10