GROUP BY error: column "f.path" must appear in the GROUP BY clause or be used in an aggregate function

Question

In Postgres 12, I'm trying to perform a SELECT on the recipes table bringing only one image (files). However, when performing the query without GROUP BY I get duplicate recipes according to the number of its images (files). When trying to use GROUP BY, I get the following error:

column "f.path" must appear in the GROUP BY clause or be used in an aggregate function

The query I'm running is:

SELECT r.id, r.title, c.name AS chef_name, f.path
FROM recipes AS r
LEFT JOIN chefs AS c ON (r.chef_id = c.id)
LEFT JOIN recipe_files AS rf ON (rf.recipe_id = r.id)
LEFT JOIN files AS f ON (rf.file_id = f.id)
GROUP BY r.id, c.id
ORDER BY r.title ASC

If I add f.path to GROUP BY, I return to the initial problem of receiving the listing with duplicate items according to the number of images (files).

The immediate error can be solved changing the first line to `SELECT r.id, max(r.title), max(c.name) AS chef_name, max(f.path)`, but I suspect the problem lies elsewhere. Please add a few rows of sample data and the expected result. — The Impaler, Aug 05 '20 at 13:18
So if there are multiple files for a recipe, then which one do you want to see? — , Aug 05 '20 at 13:27

score 1 · Answer 1 · answered Aug 05 '20 at 13:29

If you want one row per recipe, then use distinct on:

SELECT DISTINCT ON (r.title, r.id) r.id, r.title, c.name AS chef_name, f.path
FROM recipes r LEFT JOIN chefs AS c ON (r.chef_id = c.id)
     recipe_files AS rf
     ON rf.recipe_id = r.id
     files f
     ON rf.file_id = f.id
ORDER BY r.title, r.id;

score 1 · Accepted Answer · answered Aug 05 '20 at 13:31

If you just want one file for each recipe, you can join to a derived-table that picks only one:

SELECT r.id, r.title, c.name AS chef_name, f1.path
FROM recipes AS r
  LEFT JOIN chefs AS c ON r.chef_id = c.id
  LEFT JOIN (
    SELECT DISTINCT ON (rf.recipe_id) rf.recipe_id, f.path
    FROM recipe_files AS rf 
      JOIN files AS f ON rf.file_id = f.id
    ORDER BY rf.recipe_id, f.id -- picks an arbitrary file
  ) f1 ON f1.recipe_id = r.id
ORDER BY r.title ASC

score 0 · Answer 3 · answered Aug 05 '20 at 13:35

I am more familiar with SQL Server, but it looks like Postgres has a similar function. Try using ROW_NUMBER(). Reference can be found here: https://www.postgresqltutorial.com/postgresql-row_number/

It will be something like this. You will need to modify it to order how you would like.

SELECT 
    r.id
    , r.title
    , c.name AS chef_name
    , f.path
FROM (
    SELECT 
        r.id
        , r.title
        , c.name AS chef_name
        , f.path
        , ROW_NUMBER() OVER(PARTITION BY r.id, r.title, c.name AS chef_name, f.path ORDER BY r.title)
    FROM recipes AS r
        LEFT JOIN chefs AS c ON (r.chef_id = c.id)
        LEFT JOIN recipe_files AS rf ON (rf.recipe_id = r.id)
        LEFT JOIN files AS f ON (rf.file_id = f.id)
) 
WHERE row_number = 1;

score 0 · Answer 4 · answered Aug 05 '20 at 21:36

Eliminate unwanted rows as early as possible. In this case, apply DISTINCT ON before joining to files:

SELECT r.id, r.title, c.name AS chef_name, f.path
FROM   recipes    r
LEFT   JOIN chefs c ON r.chef_id = c.id
LEFT   JOIN (
   SELECT DISTINCT ON (recipe_id)
          recipe_id, file_id
   FROM   recipe_files
   -- without ORDER BY it's truly arbitrary
   ) rf ON rf.recipe_id = r.id
LEFT   JOIN files f ON rf.file_id = f.id
ORDER  BY r.title;

About DISTINCT ON:

Select first row in each GROUP BY group?

The query should be the optimum to retrieve all recipes while there are only few files per recipe.

For many files per recipe, other techniques are (much) faster:

Optimize GROUP BY query to retrieve latest row per user

To retrieve only few recipes, yet other techniques are (much) more efficient:. Like:

SELECT r.id, r.title, c.name AS chef_name, f.path
FROM   recipes    r
LEFT   JOIN chefs c ON r.chef_id = c.id
LEFT   JOIN LATERAL (
   SELECT recipe_id, file_id
   FROM   recipe_files
   WHERE  recipe_id = r.id
   ORDER  BY recipe_id, file_id
   LIMIT  1
   ) rf ON true
LEFT   JOIN files f ON rf.file_id = f.id
WHERE  r.title = 'foo'                     -- some selective filter
ORDER  BY r.title;

See:

Select data within one month prior to each user's last record

GROUP BY error: column "f.path" must appear in the GROUP BY clause or be used in an aggregate function

4 Answers4