Get "latest" row after GROUP BY over multiple tables

Question

I'd preferably like to first query listed below and just group by stories.id, but I get the following error:

ERROR: column "u.first_name" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT "s".*, "u"."first_name", "u"."last_name", ("i"."filen...

The second query works but does not group by stories.id and generates the wrong results. Is it possible to select from multiple tables and not group by all of them?

The table panels also has a column updated_at. I would like to get the newest file per story according to panels.updated_at.

   SELECT 
        "s".*, 
        "u"."first_name", 
        "u"."last_name", 
        ("i"."filename" || '.' || "i"."extension") AS "file" 
    FROM 
        "stories" "s" 
    LEFT JOIN "panels" "p" ON("p"."story_id" = "s"."id") 
    LEFT JOIN "users" "u" ON("s"."user_id" = "u"."uid")
    LEFT JOIN "images" "i" ON ("p"."image_id" = "i"."id") 
    WHERE 
        "s"."complete" = false AND 
        "s"."created_by" = 205700489 
    GROUP BY 
        "s"."id", 
    ORDER BY 
        "s"."created_at" DESC

   SELECT 
        "s".*, 
        "u"."first_name", 
        "u"."last_name", 
        ("i"."filename" || '.' || "i"."extension") AS "file" 
    FROM 
        "stories" "s" 
    LEFT JOIN "panels" "p" ON("p"."story_id" = "s"."id") 
    LEFT JOIN "users" "u" ON("s"."user_id" = "u"."uid")
    LEFT JOIN "images" "i" ON ("p"."image_id" = "i"."id") 
    WHERE 
        "s"."complete" = false AND 
        "s"."created_by" = 205700489 
    GROUP BY 
        "s"."id", 
        "u"."first_name", 
        "u"."last_name", "i"."filename", 
        "i"."extension" 
    ORDER BY 
        "s"."created_at" DESC

Your question is inconclusive. If you want to group by `story.id`, you *have* to define what to do pick from multiple rows. — Erwin Brandstetter, Nov 29 '12 at 15:38

score 2 · Accepted Answer · edited May 23 '17 at 12:04

2

Updated after clarification of the question:

SELECT DISTINCT ON (s.created_at, s.id)
       s.*
      ,u.first_name
      ,u.last_name
      ,concat_ws('.', i.filename, i.extension) AS file
FROM   stories s 
LEFT   JOIN users  u ON u.uid = s.user_id
LEFT   JOIN panels p ON p.story_id = s.id
LEFT   JOIN images i ON i.id = p.image_id
WHERE  s.complete = false
AND    s.created_by = 205700489 
ORDER  BY s.created_at DESC, s.id, p.updated_at DESC;

Grouping by primary key requires PostgreSQL 9.1.
I use concat_ws(), because I don't know which columns might be NULL. If both i.filename and i.extension are defined NOT NULL, you can simplify.

Effect of the additional ORDER BY item p.updated_at DESC is that the "newest" file will be picked per story. The query technique is explained in full under this related question:
Select first row in each GROUP BY group?

edited May 23 '17 at 12:04

Community

1
1

answered Nov 29 '12 at 15:47

Erwin Brandstetter

605,456
145
1,078
1,228

This query works okey except I get a list of files `97799.jpg, 5234.jpg` what I'm really looking for is the last file panels has an order and created by column and I'd need only the newest. – Stefan Nov 29 '12 at 16:16
1

@stefan: You need to edit your question and put all the information there. This is a public forum and a question should paint a complete picture. Comments are simply not good enough. Use `edit` right under your question. – Erwin Brandstetter Nov 29 '12 at 17:09

Ihor Romanchenko · Answer 2 · 2012-11-29T15:28:17.493

You can write something like:

   SELECT 
        "s".*, 
        (SELECT "u"."first_name" 
         FROM "users" "u"
         WHERE "s"."user_id" = "u"."uid"
         LIMIT 1) , 
        (SELECT "u"."last_name" 
         FROM "users" "u"
         WHERE "s"."user_id" = "u"."uid"
         LIMIT 1), 
        (SELECT "i"."filename" || '.' || "i"."extension"
         FROM "panels" "p" 
         JOIN "images" "i" ON ("p"."image_id" = "i"."id")
         WHERE "p"."story_id" = "s"."id" 
         LIMIT 1) AS "file" 
    FROM 
        "stories" "s" 
    WHERE 
        "s"."complete" = false AND 
        "s"."created_by" = 205700489 
    ORDER BY 
        "s"."created_at" DESC

It will get only 1 record from "users" and "panels" JOIN "images" per record in "stories" .

Add ORDER BY, extra WHERE or some aggregates to get what you need from "users" and "panels" JOIN "images"

UPD Also, you can use something like this:

SELECT *
FROM (
SELECT DISTINCT ON ("s"."id") 
    "s".*, 
    "u"."first_name", 
    "u"."last_name", 
    ("i"."filename" || '.' || "i"."extension") AS "file" 
FROM 
    "stories" "s" 
LEFT JOIN "panels" "p" ON("p"."story_id" = "s"."id") 
LEFT JOIN "users" "u" ON("s"."user_id" = "u"."uid")
LEFT JOIN "images" "i" ON ("p"."image_id" = "i"."id") 
WHERE 
    "s"."complete" = false AND 
    "s"."created_by" = 205700489 
ORDER BY 
    "s"."id"
) t ORDER BY "t"."created_at" DESC

It will leave only one row for every distinct "s"."id"

Is there a performance difference in using so many sub selects? — Stefan, Nov 29 '12 at 15:01
@stefan Most of the time postgres will optimize the query, so there wont be much performance difference. — Ihor Romanchenko, Nov 29 '12 at 15:23
@stefan: Generally, correlated subqueries are an anti-pattern for performance. Avoid them if you can. But as @Igor mentioned, Postgres can optimize the query plan under favorable conditions. Test your queries with [`EXPLAIN ANALYZE`](http://www.postgresql.org/docs/current/interactive/sql-explain.html) and compare to get definitive answers. Run multiple times to exclude caching effects. — Erwin Brandstetter, Nov 29 '12 at 16:11

Get "latest" row after GROUP BY over multiple tables

2 Answers2