2

I'm using PostgreSQL.
I want to select 1 row for every unique id, I have this query:

SELECT DISTINCT ON (station_id) *
FROM readings
ORDER BY station_id,reading_id DESC;

I have 2 tables:

readings (content isn't important, I hope):

station_id  reading_id    temp   air_pressure
--------------------------------------------
  147          100         10       800
  148          101         20       850
  149          102         30       900
  148          103         40       950
  148          104         50       1000
  147          105         60       1050

stations (content not important I hope):

id   station_name   lat   lng
-----------------------------
147      
148     
149    

So readings.station_id and station.id in stations should be the same.

My query works, but now I want to select the second newest one (2nd highest id): ignore the first row (newest reading) and select the second.

Desired result based on above sample:

station_id   reading_id   temp   air_pressure
----------------------------------------------
   148          103        40        950        
   147          100        10        800

I want the row with the second highest reading_id for every station_id (second newest reading).

Would be nice if it won't select the ones where there is only one row, but not necessary.

I'm not sure how to achieve this. I've tried so much with offset, limit etc. also tried some nested select queries ...

I want to use distinct on because I need to select second row from every unique id. But if it's not possible with distinct on, it's alright.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Ovelion
  • 61
  • 4
  • 1
    With `DISTINCT ON(station_id)` there will be no second row for `station_id`. To your question add example data showing what you are starting with and what you want to end up with? – Adrian Klaver Jun 29 '22 at 23:45
  • There are (very) efficient solutions. Is there a table `station` with one row per relevant `station_id`, too? Also: your version of Postgres, and the exact table definition (`CREATE TABLE` statement)? And roughly how many rows per `station_id`? And what if there is just a single row per `station_id`? – Erwin Brandstetter Jun 29 '22 at 23:59

1 Answers1

4

The best query heavily depends on undisclosed information.

Assuming:

  • current Postgres 14 (all should work since at least Postgres 9.3)
  • large table (?)
  • many rows per station_id (?)
  • no duplicates on (station_id, reading_id) (?)
  • a stations table with 1 row per station ✅
  • reading_idis defined NOT NULL. (?)
SELECT r.*
FROM   station s
CROSS  JOIN LATERAL (
   SELECT *
   FROM   readings r
   WHERE  r.station_id = s.station_id
   ORDER  BY reading_id DESC
   OFFSET 1
   LIMIT  1
   ) r
ORDER  BY r.station_id;  -- optional

db<>fiddle here

You get no row for stations with less than two rows.

This is very efficient for more than a few rows per station and an index on readings(station_id, reading_id DESC).

About the LATERAL join:

If you don't have a separate stations table, see:

If there can be duplicates (doesn't seem to be the case), and/or there are only few rows per group, this might be better:

SELECT (r).*
FROM  (
   SELECT r, dense_rank() OVER (PARTITION BY station_id
                                ORDER BY reading_id DESC) AS rnk
   FROM   readings r
   ) r
WHERE  rnk = 2;

Here, you get no row for stations with less than two distinct reading_id. But possibly more than one per station that tie for second place.

Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228