0

I was looking a way to select the first item from a GROUP BY in PostgreSQL, until I find this stackoverflow: Select first row in each GROUP BY group?

There, I see that the WITH command was used. I'm trying to understand some more "advanced" commands of SQL, like PARTITION, WITH, ROW_NUMBER etc. Until two or three months ago, I known only the basic commands (SELECT, INNER JOIN, LEFT JOIN, ORDER BY, GROUP BY, etc);

I have a little problem (resolved, but I don't known if this is the better way* to do).

*better way = I'm more concerned about a clean SQL code than the performance - this is just for a reports that will be executed once a day, and no more than 5000 records.

I have two tables, in PostgreSQL:

+----------------------------------------------+
| TABLE NAME: point                            |
+--------+---------------+----------+----------+
|     km |      globalid |      lat |     long |
+--------+---------------+----------+----------+
|  36600 | 1553E2AB-B2F8 | -1774.44 | -5423.58 |
| 364000 | 25EB2465-1B8A | -1773.42 | -5422.03 |
| 362000 | 5FFDE611-88DF | -1771.80 | -5420.37 |
+--------+---------------+----------+----------+


+---------------------------------------------------------+
| TABLE NAME: photo                                       |
+--------------+---------------+------------+-------------+
| attachmentid |  rel_globalid |       date |    filename |
+--------------+---------------+------------+-------------+
|            1 | 1553E2AB-B2F8 | 2015-02-24 | photo01.jpg |
|            2 | 1553E2AB-B2F8 | 2015-02-24 | photo02.jpg |
|          405 | 25EB2465-1B8A | 2015-02-12 | photo03.jpg |
|          406 | 25EB2465-1B8A | 2015-02-12 | photo04.jpg |
|          407 | 25EB2465-1B8A | 2015-02-13 | photo06.jpg |
|            3 | 5FFDE611-88DF | 2015-02-12 | photo07.jpg |
+--------------+---------------+------------+-------------+

So, for the problem:

Every point has one or more photos, but I only need the point data, and first and the last photo. If point has only one photo, I need only the first photo. If point has three photos, I need only the first and the third photo.

So, how I resolved:

First, I need the first photo of every point, so, I grouped by rel_globalid, and numbered every photo by group:

WITH photos_numbered AS (
    SELECT
      rel_globalid,
      date,
      filename,
      ROW_NUMBER()
      OVER (
        PARTITION BY rel_globalid
        ORDER BY date
      ) AS photo_num
    FROM
      photo
)

With this code, I can get the 2th, 3th and so on too.

Ok, so, now, I want to get the first photo (still using the WITH above):

SELECT *
FROM
  photos_numbered
WHERE
  photo_num = 1

And to get the last photo, I used the following SQL:

SELECT
  p1.*
FROM
  photos_numbered p1
JOIN (
  SELECT
    rel_globalid,
    max(photo_num) photo_num
  FROM
    photos_numbered
  GROUP BY
    rel_globalid
  ) p2
  ON
    p1.rel_globalid = p2.rel_globalid AND
    p1.photo_num = p2.photo_num
WHERE
  p1.photo_num > 1

The WHERE p1.photo_num > 1 is because if point has only one photo, this photo will appear as first photo, and last photo will be NULL.

OK, now I must "convert" the SELECT for the first photo and the last photo to a WITH, and do a simple SELECT with a INNER JOIN for the first photo and a LEFT JOIN for the last photo:

WITH photos_numbered AS (
    SELECT
      rel_globalid,
      date,
      filename,
      ROW_NUMBER()
      OVER (
        PARTITION BY rel_globalid
        ORDER BY date
      ) AS photo_num
    FROM
      photo
), first_photo AS (
    SELECT *
    FROM
      photos_numbered
    WHERE
      photo_num = 1
), last_photo AS (
    SELECT p1.*
    FROM
      photos_numbered p1
      JOIN (
             SELECT
               rel_globalid,
               max(photo_num) photo_num
             FROM
               photos_numbered
             GROUP BY
               rel_globalid
           ) p2
        ON p1.rel_globalid = p2.rel_globalid AND
           p1.photo_num = p2.photo_num
    WHERE
      p1.photo_num > 1
)
SELECT DISTINCT
  point.km,
  point.globalid,
  point.lat,
  point."long",
  first_photo.date     AS fp_date,
  first_photo.filename AS fp_filename,
  last_photo.date      AS lp_date,
  last_photo.filename  AS lp_filename
FROM
  point
  INNER JOIN
  first_photo
    ON
      first_photo.rel_globalid = point.globalid
  LEFT JOIN
  last_photo
    ON
      last_photo.rel_globalid = point.globalid
ORDER BY
  km

I think this SQL is huge for a 'simple thing'!

Is working? Yes, but I want some advices, some documentations that I can read and understand better, some commands that maybe I can use to make a "better" SQL (like I said, about two or three months ago I don't even know the PARTITION and WITH commands).

I tried to put a link for SQLFiddle here, but SQLFiddle never worked for me (always return 'oops' message).

Community
  • 1
  • 1
Roberto Correia
  • 1,696
  • 5
  • 20
  • 36

2 Answers2

2

If you are looking for clean SQL, then try lateral left join together with first_value and last_value window functionsinstead of common table expression (WITH clause):

select *
from point po
left join lateral 
(
   select first_value( date )     over( order by ph.date) as first_photo_date,
          first_value( filename ) over( order by ph.date) as first_photo_filename,
          last_value( date )      over( order by ph.date) as last_photo_date,
          last_value( filename )  over( order by ph.date) as last_photo_filename    
   from photo ph
   where po.globalid = ph.rel_globalid 
   limit 1
) q on true
;

an additional count(*) over() with a case expression can be used to "clean" values of last photo when there is only one record:

select *
from point po
left join lateral 
(
   select first_value( date )     over( order by ph.date) as first_photo_date,
          first_value( filename ) over( order by ph.date) as first_photo_filename,
          case when count(*) over () > 1 
               then last_value( date )    over( order by ph.date)
          end as last_photo_date,
          case when count(*) over () > 1 
                then last_value( filename )  over( order by ph.date) 
          end as last_photo_filename    
   from photo ph
   where po.globalid = ph.rel_globalid 
   limit 1
) q on true
;
krokodilko
  • 35,300
  • 7
  • 55
  • 79
  • tks! left join lateral is new for me! unfortunately, it's not compatible with postgresql 9.2 running here (don't ask me why they are running this version)! – Roberto Correia Feb 07 '17 at 13:54
0

Using the answer from krokodilko, I made a new SQL query without LEFT JOIN LATERAL, because I'm using PostgreSQL 9.2 (without LEFT JOIN LATERAL).

SELECT DISTINCT
  po.km,
  po.globalid,
  po.lat,
  po."long",
  ph.fp_date,
  ph.fp_filename,
  ph.lp_date,
  ph.lp_filename
FROM
  point po
INNER JOIN
  (
    SELECT DISTINCT
      rel_globalid,
      first_value(date) OVER (PARTITION BY ph.rel_globalid) AS fp_date,
      first_value(filename) OVER (PARTITION BY ph.rel_globalid) AS fp_filename,
      CASE WHEN count(*) OVER (PARTITION BY ph.rel_globalid) > 1 THEN 
        last_value(date) OVER (PARTITION BY ph.rel_globalid)
      END AS lp_date,
      CASE WHEN count(*) OVER (PARTITION BY ph.rel_globalid) > 1 THEN 
        last_value(filename) OVER (PARTITION BY ph.rel_globalid)
      END AS lp_filename
    FROM
      photo ph
    ORDER BY
      rel_globalid
  ) ph
  ON ph.rel_globalid = po.globalid

only thing I don't like it's OVER (PARTITION) in almost every field in INNER JOIN

Roberto Correia
  • 1,696
  • 5
  • 20
  • 36