0

Is there a way for me use the MAX() function for the imdb_score on each genres (ARRAYS)? As I would like to get the highest imdb_score per genre that would show me the title.

    title                                    genres         imdb_score
"Five Came Back: The Reference Films"   "{documentation}"     NULL
"Taxi Driver"                           "{crime,drama}"       8.3
"Monty Python and the Holy Grail"       "{comedy,fantasy}"    8.2
"Life of Brian"                         "{comedy}"            8
"The Exorcist"                          "{horror}"            8.1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    What do you mean by "*… that would show me the title*"? Can you post the expected result of the query, please? – Bergi Jul 08 '22 at 01:15
  • Seems possible by unnesting the array: `SELECT g.genre, MAX(f.imdb_score) FROM films f, LATERAL unnest(f.genres) AS g(genre) GROUP BY g.genre` – Bergi Jul 08 '22 at 01:17

1 Answers1

0

This shows the film with the highest score per genre:

SELECT DISTINCT ON (genre) *
FROM  (
   SELECT unnest(genres) AS genre, title, imdb_score
   FROM   tbl
   ) sub
ORDER  BY genre, imdb_score DESC NULLS LAST;

db<>fiddle here

I think that's what you want.

Unnest the array with unnest().
About the set-returning function unnest() in the SELECT list:

You did not define how to break ties. (Multiple films with equally the highest score in the genre.) So this query picks an arbitrary winner.

About DISTINCT ON:

Since the imdb_score can be NULL, it's crucial to add NULLS LAST to the descending sort order. See:


If you insist on max() ...
Also returning all winning films per genre:

WITH flat AS (
   SELECT unnest(genres) AS genre, imdb_score, title
   FROM   tbl
   )
SELECT genre, title, imdb_score
FROM  (
   SELECT genre, max(imdb_score) AS imdb_score
   FROM   flat
   GROUP  BY 1
   ) sub2
JOIN   flat f USING (genre, imdb_score)
ORDER  BY genre, title;

Much slower and convoluted. To get all winners per genre, rather use the window function rank():

SELECT genre, title, imdb_score
FROM  (
   SELECT genre, title, imdb_score
        , rank() OVER (PARTITION BY genre ORDER BY imdb_score DESC NULLS LAST) AS rnk
   FROM  (
      SELECT unnest(genres) AS genre, imdb_score, title
      FROM   tbl
      ) sub
   ) sub2
WHERE  rnk = 1
ORDER  BY genre, title;

Does not eliminate winners with a NULL score - where no other score exists. (Unlike the solution with max().)

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228