6

We have a problem grouping arrays into a single array. We want to join the values from two columns into one single array and aggregate these arrays of multiple rows.

Given the following input:

| id | name | col_1 | col_2 |
| 1  |  a   |   1   |   2   |
| 2  |  a   |   3   |   4   |
| 4  |  b   |   7   |   8   |
| 3  |  b   |   5   |   6   |

We want the following output:

| a | { 1, 2, 3, 4 } |
| b | { 5, 6, 7, 8 } |

The order of the elements is important and should correlate with the id of the aggregated rows.

We tried the array_agg() function:

SELECT array_agg(ARRAY[col_1, col_2]) FROM mytable GROUP BY name;

Unfortunately, this statement raises an error:

ERROR: could not find array type for data type character varying[]

It seems to be impossible to merge arrays in a group by clause using array_agg().

Any ideas?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
tbz
  • 197
  • 2
  • 10

2 Answers2

6

UNION ALL

You could "unpivot" with UNION ALL first:

SELECT name, array_agg(c) AS c_arr
FROM  (
   SELECT name, id, 1 AS rnk, col1 AS c FROM tbl
   UNION ALL
   SELECT name, id, 2, col2 FROM tbl
   ORDER  BY name, id, rnk
   ) sub
GROUP  BY 1;

Adapted to produce the order of values you later requested. The manual:

The aggregate functions array_agg, json_agg, string_agg, and xmlagg, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work.

Bold emphasis mine.

LATERAL subquery with VALUES expression

LATERAL requires Postgres 9.3 or later.

SELECT t.name, array_agg(c) AS c_arr
FROM  (SELECT * FROM tbl ORDER BY name, id) t
CROSS  JOIN LATERAL (VALUES (t.col1), (t.col2)) v(c)
GROUP  BY 1;

Same result. Only needs a single pass over the table.

Custom aggregate function

Or you could create a custom aggregate function like discussed in these related answers:

CREATE AGGREGATE array_agg_mult (anyarray)  (
    SFUNC     = array_cat
  , STYPE     = anyarray
  , INITCOND  = '{}'
);

Then you can:

SELECT name, array_agg_mult(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Or, typically faster, while not standard SQL:

SELECT name, array_agg_mult(ARRAY[col1, col2]) AS c_arr
FROM  (SELECT * FROM tbl ORDER BY name, id) t
GROUP  BY 1;

The added ORDER BY id (which can be appended to such aggregate functions) guarantees your desired result:

a | {1,2,3,4}
b | {5,6,7,8}

Or you might be interested in this alternative:

SELECT name, array_agg_mult(ARRAY[ARRAY[col1, col2]] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Which produces 2-dimensional arrays:

a | {{1,2},{3,4}}
b | {{5,6},{7,8}}

The last one can be replaced (and should be, as it's faster!) with the built-in array_agg() in Postgres 9.5 or later - with its added capability of aggregating arrays:

SELECT name, array_agg(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Same result. The manual:

input arrays concatenated into array of one higher dimension (inputs must all have same dimensionality, and cannot be empty or null)

So not exactly the same as our custom aggregate function array_agg_mult();

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I wanted to avoid creating a custom function and use only built-in ones if possible. Despite that, using the custom function would solve my problem. – tbz Jul 03 '14 at 21:13
  • @tbz: All solutions should work now, as well as parallel unnesting provided by Clodoaldo. – Erwin Brandstetter Jul 03 '14 at 21:21
  • The UNION ALL solution is the one I was looking for. Why is the faster one with the subquery not SQL standard? – tbz Jul 03 '14 at 21:31
  • @tbz: It's just an implementation detail in Postgres that it uses sorted input. This has no place in a set-based logic, but it is typically much more efficient to sort the input once, than to sort per group in the aggregated output. That's why the manual only states "will usually work" ... – Erwin Brandstetter Jul 03 '14 at 21:54
  • is there a way to write `array_agg_mult` so that it removes duplicate items? – strider Jun 16 '20 at 20:13
  • 1
    @strider: There are various ways. You need to define exactly what qualifies as "duplicate". Exact data type may be relevant, too. In any case, you cannot simply add the keyword `DISTINCT` to the aggregate call to remove duplicate base element. And you may need to special-case NULL values. Please ask a new *question* with all relevant details. You can always link to this one for context and add another comment here to link back (and get my attention). I applied some updates above while being here. – Erwin Brandstetter Jun 19 '20 at 02:07
1
select n, array_agg(c) as c
from (
    select n, unnest(array[c1, c2]) as c
    from t
) s
group by n

Or simpler

select
    n,
    array_agg(c1) || array_agg(c2) as c
from t
group by n

To address the new ordering requirement:

select n, array_agg(c order by id, o) as c
from (
    select
        id, n,
        unnest(array[c1, c2]) as c,
        unnest(array[1, 2]) as o
    from t
) s
group by n
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • Your solutions create two different orders of the array elements. Unfortunately, I forgot to mention that the order of the elements is important. I edited my question to clarify this. – tbz Jul 03 '14 at 20:59
  • +1 to statement pipes to concatenate arrays when arrays are small, you only need to pass all values to another function and ordering is not required – cortopy Jan 29 '16 at 11:11