22

I'm looking for a sql pattern for an aggregate function to aggregate arrays. If I have 2 rows:

|id  |     array     |
|----+---------------|
|1   |    [1,2,3,4]  |
|1   |    [5,6]      |
|1   |    [7,8]      |
|--------------------|

And I want to do the following:

select id, *aggregate_function*(array) from table group by id

I want the result to be:

|id   | *aggregate_function*  |
|-----+-----------------------|
|1    | [1,2,3,4,5,6,7,8]     |
|-----------------------------|

There is no native postgres function that does this aggregation. But perhaps there's sql pattern that can be used here?

skmathur
  • 1,587
  • 5
  • 14
  • 21

3 Answers3

17

Something like this should work:

with mytable as
(
select 1 as id, array[1, 2, 3, 4] as myarray

union all

select 1 as id, array[5, 6] as myarray

union all 

select 1 as id, array[7, 8] as myarray
)

select
  id,
  array_agg(elements order by elements)
from mytable, unnest(myarray) as elements
group by id

There's some discussion about building the custom function here: Concatenate/merge array values during grouping/aggregation

Vincent
  • 7,808
  • 13
  • 49
  • 63
6

You can unnest and then group by:

WITH x (id, arr) AS (
  VALUES 
    (1, ARRAY[1,2,3,4])
  , (1, ARRAY[5,6])
  , (1, ARRAY[7, 8])
)
SELECT id, ARRAY_AGG(splitup) 
FROM (
    SELECT 
        id
      , unnest(arr) splitup 
    FROM x) splitup
GROUP BY 1
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
  • How would this solution work if you have multiple array columns? Would you do a subquery for each column separately and join them somehow? – skmathur Nov 09 '17 at 18:52
  • Perhaps something like this, but I wonder if there's something more efficient ``` WITH x (id, arr, arr2) AS ( VALUES (1, ARRAY[1,2,3,4], array['a', 'b']) , (1, ARRAY[5,6], array['c']) , (1, ARRAY[7, 8], array['d', 'e', 'f']) ) SELECT id, ARRAY_AGG(distinct splitup), array_agg(distinct splitup2) as splitup2 FROM ( SELECT id , unnest(arr) splitup , unnest(arr2) as splitup2 FROM x ) splitup GROUP BY 1 ``` – Vincent Nov 09 '17 at 18:54
  • depends, are you only interested in distinct values. Then you can use the solution @vincent suggests in the priod comment. If you need to preserve record of an array value appearing multiple times, then you'd need multiple subqueries, otherwise you would end up with a fanout. – Haleemur Ali Nov 09 '17 at 18:59
4

One option is to create a custom aggregate:

CREATE FUNCTION array_union(a ANYARRAY, b ANYARRAY)
RETURNS ANYARRAY AS
$$
  SELECT array_agg(x)
  FROM (
    SELECT unnest(a) x
    UNION ALL
    SELECT unnest(b)
  ) AS u
$$ LANGUAGE SQL;

CREATE AGGREGATE array_union_agg(ANYARRAY) (
  SFUNC = array_union,
  STYPE = ANYARRAY,
  INITCOND = '{}'
);

You can then simply use:

WITH x(arr) AS (
  VALUES
    (ARRAY[1,2,3,4]),
    (ARRAY[5,6]),
    (ARRAY[7,8])
)
SELECT array_union_agg(arr)
FROM x;

The query results in:

 array_union_agg
-----------------
 {1,2,3,4,5,6,7,8}

The above version does array concatenation.

Making true union as if the arrays were sets can be achieved by replacing UNION ALL with UNION in the definition of array_union function.

Tregoreg
  • 18,872
  • 15
  • 48
  • 69