49

I have a column which is of type integer array. How can I merge all of them into a single integer array?

For example: If I execute query:

select column_name from table_name

I get result set as:

-[RECORD 1]----------
column_name | {1,2,3}
-[RECORD 2]----------
column_name | {4,5}

How can I get {1,2,3,4,5} as final result?

user1465266
  • 621
  • 1
  • 6
  • 12

6 Answers6

67

You could use unnest to open up the arrays and then array_agg to put them back together:

select array_agg(c)
from (
  select unnest(column_name)
  from table_name
) as dt(c);
mu is too short
  • 426,620
  • 70
  • 833
  • 800
25

Define a trivial custom aggregate:

CREATE AGGREGATE array_cat_agg(anyarray) (
  SFUNC=array_cat,
  STYPE=anyarray
);

and use it:

WITH v(a) AS ( VALUES (ARRAY[1,2,3]), (ARRAY[4,5,6,7]))
SELECT array_cat_agg(a) FROM v;

If you want a particular order, put it within the aggregate call, i.e. array_cat_agg(a ORDER BY ...)

This is roughly O(n log n) for n rows (I think) O(n²) so it is unsuitable for long sets of rows. For better performance you'd need to write it in C, where you can use the more efficient (but horrible to use) C API for PostgreSQL arrays to avoid re-copying the array each iteration.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • 2
    FWIW this aggregate runs in quadratic time `O(n^2)` and so isn't suitable for large datasets. source: I used it on some large datasets in production and had to rip it out =) – John Bledsoe Mar 17 '21 at 14:03
  • @JohnBledsoe I'm surprised it's `O(n^2)`, are you sure? It copies the whole array once per iteration, including all prior members, when it constructs a new one. Either way, it'll still be plenty slow for long inputs. – Craig Ringer Mar 23 '21 at 05:32
  • 1
    I've been out of CS school for a long time, so I'm not sure =) but yeah copying an N-length array N times is `O(n^2)` IIRC. – John Bledsoe Mar 24 '21 at 12:02
  • 1
    @JohnBledsoe The array starts at length 1. Each time you copy it, it grows by 1 element. Assuming each input array is of the same length (treated as 1 element for this purpose): 1 elements copied @ n=1 . 3 elements copied @ n=2 . 6 elements copied @ n=3. 10 elements copied @ n=4. It's a series sum n∑n . Which is (n·n)/2 or n²/2 .. so O(n^2). You're quite right. Pg doesn't have mutable arrays at the SQL level so you'd need to use a PL (say, Python with `numpy` or `intarray`) or use C to do it more efficiently. – Craig Ringer Mar 24 '21 at 13:25
  • Not sure about the mathematics here, but from my experience it was very slow too. Took forever (I gave up after 30 seconds) on a 300K rows table, while mu-is-too-short's solution took 215ms. – Ezra Steinmetz Mar 20 '22 at 13:36
  • I did manage to optimize your solution to 2.5 seconds by grouping the rows ( https://gist.github.com/ezra100/1ab9be45eb21569b5aa0e6212495b1c9 ), but mu-is-too-short's is simpler and faster – Ezra Steinmetz Mar 20 '22 at 13:51
3

to merge arrays you can use || operator more

to put result in flat list use uunest func

example:

select unnest(ARRAY[1,2] || ARRAY[3,2] || ARRAY[4,5]) as number;

result

Maxim
  • 2,233
  • 6
  • 16
2
string_to_array(string_agg(array_to_string(column_name ,','),','),',')

This can be some clue for your situation. I've done like this.

mouse500
  • 41
  • 4
  • Please provide additional details in your answer. As it's currently written, it's hard to understand your solution. – Community Sep 02 '21 at 04:24
1

You could use lateral subquery for that:

select array_agg(u.a)
from (values (array[1, 2, 3]), (array[4, 5])) t (a)
    join lateral unnest(t.a) u (a) on true;
-2

The only way you can do this is inside a function:

CREATE FUNCTION merge_arrays() RETURNS int[] AS $$
DECLARE
  this record;
  res  int[];
BEGIN
  FOR this IN
    SELECT column_name FROM table_name
  LOOP
    array_cat(res, this.column_name);
  END LOOP;
  RETURN res;
END; $$ LANGUAGE plpgsql;

Then you can

SELECT merge_arrays();

to get the result you are looking for.

This of course hard-codes your table definition into the function, which may (or may not) be an issue. In addition, you may want to put a WHERE clause in the loop query to restrict the records whose arrays you want to append; you might use an additional function parameter to do this.

Keep in mind that you might get a really large array as your table increases in size and that may affect performance. Do you really need all sub-arrays from all records in one large array? Have a look at your application and see if you can do the merge at that level, rather than in a single query.

Patrick
  • 29,357
  • 6
  • 62
  • 90