29

I have two array values of the same length in PostgreSQL:

{a,b,c} and {d,e,f}

and I'd like to combine them into

{{a,d},{b,e},{c,f}}

Is there a way to do that?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
dan
  • 43,914
  • 47
  • 153
  • 254

2 Answers2

43

Postgres 9.5 or later

has array_agg(array expression):

array_agg ( anyarray ) → anyarray

Concatenates all the input arrays into an array of one higher dimension. (The inputs must all have the same dimensionality, and cannot be empty or null.)

This is a drop-in replacement for my custom aggregate function array_agg_mult() demonstrated below. It's implemented in C and considerably faster. Use it.

Postgres 9.4

Use the ROWS FROM construct or the updated unnest() which takes multiple arrays to unnest in parallel. Each can have a different length. You get (per documentation):

[...] the number of result rows in this case is that of the largest function result, with smaller results padded with null values to match.

Use this cleaner and simpler variant:

SELECT ARRAY[a,b] AS ab
FROM   unnest('{a,b,c}'::text[] 
            , '{d,e,f}'::text[]) x(a,b);

Postgres 9.3 or older

Simple zip()

Consider the following demo for Postgres 9.3 or earlier:

SELECT ARRAY[a,b] AS ab
FROM  (
   SELECT unnest('{a,b,c}'::text[]) AS a
        , unnest('{d,e,f}'::text[]) AS b
    ) x;

Result:

  ab
-------
 {a,d}
 {b,e}
 {c,f}

Note that both arrays must have the same number of elements to unnest in parallel, or you get a cross join instead.

You can wrap this into a function, if you want to:

CREATE OR REPLACE FUNCTION zip(anyarray, anyarray)
  RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT ARRAY[a,b] FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;

Call:

SELECT zip('{a,b,c}'::text[],'{d,e,f}'::text[]);

Same result.

zip() to multi-dimensional array:

Now, if you want to aggregate that new set of arrays into one 2-dimenstional array, it gets more complicated.

SELECT ARRAY (SELECT ...)

or:

SELECT array_agg(ARRAY[a,b]) AS ab
FROM  (
   SELECT unnest('{a,b,c}'::text[]) AS a
         ,unnest('{d,e,f}'::text[]) AS b
    ) x

or:

SELECT array_agg(ARRAY[ARRAY[a,b]]) AS ab
FROM  ...

will all result in the same error message (tested with pg 9.1.5):

ERROR: could not find array type for data type text[]

But there is a way around this, as we worked out under this closely related question.
Create a custom aggregate function:

CREATE AGGREGATE array_agg_mult (anyarray) (
   SFUNC    = array_cat
 , STYPE    = anyarray
 , INITCOND = '{}'
);

And use it like this:

SELECT array_agg_mult(ARRAY[ARRAY[a,b]]) AS ab
FROM  (
   SELECT unnest('{a,b,c}'::text[]) AS a
        , unnest('{d,e,f}'::text[]) AS b
    ) x

Result:

{{a,d},{b,e},{c,f}}

Note the additional ARRAY[] layer! Without it and just:

SELECT array_agg_mult(ARRAY[a,b]) AS ab
FROM ...

You get:

{a,d,b,e,c,f}

Which may be useful for other purposes.

Roll another function:

CREATE OR REPLACE FUNCTION zip2(anyarray, anyarray)
  RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT array_agg_mult(ARRAY[ARRAY[a,b]])
FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;

Call:

SELECT zip2('{a,b,c}'::text[],'{d,e,f}'::text[]); -- or any other array type

Result:

{{a,d},{b,e},{c,f}}
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    In PostgreSQL 9.3 and above it'll be possible to rewrite this to use the SQL-standard `LATERAL` instead of PostgreSQL's sometimes "interesting" set-returning-functions-in-select-list feature. Until then, beware when using arrays of different lengths, you might be surprised by the results. – Craig Ringer Sep 13 '12 at 22:44
  • 2
    @CraigRinger: Right. For the purpose of this question, the OP defined arrays `of the same length`, though. – Erwin Brandstetter Sep 13 '12 at 22:48
  • 2
    Yep, just making sure it's noted, because others who find this might not. Personally I'd write it in PL/PgSQL and add a sanity check. – Craig Ringer Sep 13 '12 at 22:49
8

Here's another approach that's safe for arrays of differing lengths, using the array multi-aggregation mentioned by Erwin:

CREATE OR REPLACE FUNCTION zip(array1 anyarray, array2 anyarray) RETURNS text[]
AS $$
SELECT array_agg_mult(ARRAY[ARRAY[array1[i],array2[i]]])
FROM generate_subscripts(
  CASE WHEN array_length(array1,1) >= array_length(array2,1) THEN array1 ELSE array2 END,
  1
) AS subscripts(i)
$$ LANGUAGE sql;

regress=> SELECT zip('{a,b,c}'::text[],'{d,e,f}'::text[]);
         zip         
---------------------
 {{a,d},{b,e},{c,f}}
(1 row)


regress=> SELECT zip('{a,b,c}'::text[],'{d,e,f,g}'::text[]);
             zip              
------------------------------
 {{a,d},{b,e},{c,f},{NULL,g}}
(1 row)

regress=> SELECT zip('{a,b,c,z}'::text[],'{d,e,f}'::text[]);
             zip              
------------------------------
 {{a,d},{b,e},{c,f},{z,NULL}}
(1 row)

If you want to chop off the excess rather than null-padding, just change the >= length test to <= instead.

This function does not handle the rather bizarre PostgreSQL feature that arrays may have a stating element other than 1, but in practice nobody actually uses that feature. Eg with a zero-indexed 3-element array:

regress=> SELECT zip('{a,b,c}'::text[], array_fill('z'::text, ARRAY[3], ARRAY[0]));
          zip           
------------------------
 {{a,z},{b,z},{c,NULL}}
(1 row)

wheras Erwin's code does work with such arrays, and even with multi-dimensional arrays (by flattening them) but does not work with arrays of differing length.

Arrays are a bit special in PostgreSQL, they're a little too flexible with multi-dimensional arrays, configurable origin index, etc.

In 9.4 you'll be able to write:

SELECT array_agg_mult(ARRAY[ARRAY[a,b])
FROM unnest(array1) WITH ORDINALITY as (o,a)
NATURAL FULL OUTER JOIN
unnest(array2) WITH ORDINALITY as (o,b);

which will be a lot nicer, especially if an optimisation to scan the functions together rather than doing a sort and join goes in.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Thanks for the answer. So you're not a fan of using the Array datatype in Postgres? – dan Jul 29 '13 at 19:12
  • 1
    @dan On the contrary, it's *extremely* useful, particularly as an intermediate result in complex queries and as an optimisation. It's just that the way arrays are implemented in Pg makes their behaviour a little ... surprising ... at times. Arrays with configurable starting indexes, multi-dimensional arrays under a single array type, etc. I just wish there was a more conventional simplified array type. – Craig Ringer Jul 30 '13 at 00:58
  • 1
    Nice add and interesting bit of news from pg 9.4. BTW.: One can use `arr_var[-2147483648:2147483647]` to "normalize" array subscripts; [by courtesy of Daniel](http://stackoverflow.com/questions/12011569/normalize-array-subscripts-for-1-dimensional-array-so-they-start-with-1). – Erwin Brandstetter Jul 30 '13 at 06:30