PostgreSQL: Efficiently aggregate array columns as part of a group by

Question

We wish to perform a GROUP BY operation on a table. The original table contains an ARRAY column. Within a group, the content of these arrays should be transformed into a single array with unique elements. No ordering of these elements is required. contain Newest PostgreSQL versions are available.

Example original table:

  id | fruit  | flavors               
---: | :----- | :---------------------
     | apple  | {sweet,sour,delicious}
     | apple  | {sweet,tasty}         
     | banana | {sweet,delicious}

Exampled desired result:

count_total | aggregated_flavors          
----------: | :---------------------------
          1 | {delicious,sweet} 
          2 | {sour,tasty,delicious,sweet}

SQL toy code to create the original table:

CREATE TABLE example(id int, fruit text, flavors text ARRAY);
INSERT INTO example (fruit, flavors)
VALUES ('apple', ARRAY [ 'sweet','sour', 'delicious']),
       ('apple', ARRAY [ 'sweet','tasty' ]),
       ('banana', ARRAY [ 'sweet', 'delicious']);

We have come up with a solution requiring transforming the array to s

SELECT COUNT(*) AS count_total,
       array
  (SELECT DISTINCT unnest(string_to_array(replace(replace(string_agg(flavors::text, ','), '{', ''), '}', ''), ','))) AS aggregated_flavors
FROM example
GROUP BY fruit

However we think this is not optimal, and may be problematic as we assume that the string does neither contain "{", "}", nor ",". It feels like there must be functions to combine arrays in the way we need, but we weren't able to find them.

Thanks a lot everyone!

It is not clear to me what you are trying with the many nested string operations — S-Man, Aug 12 '19 at 15:10

S-Man · Accepted Answer · 2019-08-12T15:58:53.303

3

demo:db<>fiddle

Assuming each record contains a unique id value:

SELECT
    fruit,
    array_agg(DISTINCT flavor),      -- 2
    COUNT(DISTINCT id)               -- 3
FROM
    example,
    unnest(flavors) AS flavor        -- 1
GROUP BY fruit

unnest() array elements
Group by fruit value: array_agg() for distinct flavors
Group by fruit value: COUNT() for distinct ids with each fruit group.

if the id column is really empty, you could generate the id values for example with the row_number() window function:

demo:db<>fiddle

SELECT
    * 
FROM (
    SELECT
        *, row_number() OVER () as id
    FROM example
) s,
unnest(flavors) AS flavor

edited Aug 12 '19 at 15:58

answered Aug 12 '19 at 15:06

S-Man

22,521
7
40
63

Great answer, works like a charm! I would have never gotten the idea to shift the unnest to the FROM section. As a bonus, this approach extends naturally to aggregation of multiple Array columns. – elke Aug 13 '19 at 06:21

PostgreSQL: Efficiently aggregate array columns as part of a group by

1 Answers1

Linked