0

In Postgres 12, I'm trying to perform a SELECT on the recipes table bringing only one image (files). However, when performing the query without GROUP BY I get duplicate recipes according to the number of its images (files). When trying to use GROUP BY, I get the following error:

column "f.path" must appear in the GROUP BY clause or be used in an aggregate function

The query I'm running is:

SELECT r.id, r.title, c.name AS chef_name, f.path
FROM recipes AS r
LEFT JOIN chefs AS c ON (r.chef_id = c.id)
LEFT JOIN recipe_files AS rf ON (rf.recipe_id = r.id)
LEFT JOIN files AS f ON (rf.file_id = f.id)
GROUP BY r.id, c.id
ORDER BY r.title ASC

If I add f.path to GROUP BY, I return to the initial problem of receiving the listing with duplicate items according to the number of images (files).

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • The immediate error can be solved changing the first line to `SELECT r.id, max(r.title), max(c.name) AS chef_name, max(f.path)`, but I suspect the problem lies elsewhere. Please add a few rows of sample data and the expected result. – The Impaler Aug 05 '20 at 13:18
  • 2
    So if there are multiple files for a recipe, then which one do you want to see? –  Aug 05 '20 at 13:27

4 Answers4

1

If you want one row per recipe, then use distinct on:

SELECT DISTINCT ON (r.title, r.id) r.id, r.title, c.name AS chef_name, f.path
FROM recipes r LEFT JOIN chefs AS c ON (r.chef_id = c.id)
     recipe_files AS rf
     ON rf.recipe_id = r.id
     files f
     ON rf.file_id = f.id
ORDER BY r.title, r.id;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

If you just want one file for each recipe, you can join to a derived-table that picks only one:

SELECT r.id, r.title, c.name AS chef_name, f1.path
FROM recipes AS r
  LEFT JOIN chefs AS c ON r.chef_id = c.id
  LEFT JOIN (
    SELECT DISTINCT ON (rf.recipe_id) rf.recipe_id, f.path
    FROM recipe_files AS rf 
      JOIN files AS f ON rf.file_id = f.id
    ORDER BY rf.recipe_id, f.id -- picks an arbitrary file
  ) f1 ON f1.recipe_id = r.id
ORDER BY r.title ASC
0

I am more familiar with SQL Server, but it looks like Postgres has a similar function. Try using ROW_NUMBER(). Reference can be found here: https://www.postgresqltutorial.com/postgresql-row_number/

It will be something like this. You will need to modify it to order how you would like.

SELECT 
    r.id
    , r.title
    , c.name AS chef_name
    , f.path
FROM (
    SELECT 
        r.id
        , r.title
        , c.name AS chef_name
        , f.path
        , ROW_NUMBER() OVER(PARTITION BY r.id, r.title, c.name AS chef_name, f.path ORDER BY r.title)
    FROM recipes AS r
        LEFT JOIN chefs AS c ON (r.chef_id = c.id)
        LEFT JOIN recipe_files AS rf ON (rf.recipe_id = r.id)
        LEFT JOIN files AS f ON (rf.file_id = f.id)
) 
WHERE row_number = 1;
Kupokev
  • 169
  • 8
0

Eliminate unwanted rows as early as possible. In this case, apply DISTINCT ON before joining to files:

SELECT r.id, r.title, c.name AS chef_name, f.path
FROM   recipes    r
LEFT   JOIN chefs c ON r.chef_id = c.id
LEFT   JOIN (
   SELECT DISTINCT ON (recipe_id)
          recipe_id, file_id
   FROM   recipe_files
   -- without ORDER BY it's truly arbitrary
   ) rf ON rf.recipe_id = r.id
LEFT   JOIN files f ON rf.file_id = f.id
ORDER  BY r.title;

About DISTINCT ON:

The query should be the optimum to retrieve all recipes while there are only few files per recipe.

For many files per recipe, other techniques are (much) faster:

To retrieve only few recipes, yet other techniques are (much) more efficient:. Like:

SELECT r.id, r.title, c.name AS chef_name, f.path
FROM   recipes    r
LEFT   JOIN chefs c ON r.chef_id = c.id
LEFT   JOIN LATERAL (
   SELECT recipe_id, file_id
   FROM   recipe_files
   WHERE  recipe_id = r.id
   ORDER  BY recipe_id, file_id
   LIMIT  1
   ) rf ON true
LEFT   JOIN files f ON rf.file_id = f.id
WHERE  r.title = 'foo'                     -- some selective filter
ORDER  BY r.title;

See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228