14

I have a table with arrays as one column, and I want to sum the array elements together:

> create table regres(a int[] not null);
> insert into regres values ('{1,2,3}'), ('{9, 12, 13}');
> select * from regres;
     a
-----------
 {1,2,3}
 {9,12,13}

I want the result to be:

{10, 14, 16}

that is: {1 + 9, 2 + 12, 3 + 13}.

Does such a function already exist somewhere? The intagg extension looked like a good candidate, but such a function does not already exist.

The arrays are expected to be between 24 and 31 elements in length, all elements are NOT NULL, and the arrays themselves will also always be NOT NULL. All elements are basic int. There will be more than two rows per aggregate. All arrays will have the same number of elements, in a query. Different queries will have different number of elements.

My implementation target is: PostgreSQL 9.1.13

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
François Beausoleil
  • 16,265
  • 11
  • 67
  • 90
  • 1
    Also posted to pgsql-general mailing list: [Pairwise array sum aggregate function?](http://www.postgresql.org/message-id/BFABC9D5-D791-428E-95B9-8DB690E74711@teksol.info) – François Beausoleil Jul 28 '14 at 14:10
  • 1
    If you can live with some excessive typing: `select array[sum(a[1]), sum(a[2]), sum(a[3]), sum(a[4]), ....] from regres;` –  Jul 28 '14 at 14:17
  • @a_horse_with_no_name I hadn't thought of that. Excellent thinking outside of the box. I generate the query, so it's not really a problem to write such a select clause. – François Beausoleil Jul 28 '14 at 14:25

3 Answers3

17

General solutions for any number of arrays with any number of elements. Individual elements or the the whole array can be NULL, too:

Simpler in 9.4+ using WITH ORDINALITY

SELECT ARRAY (
   SELECT sum(elem)
   FROM  tbl t
       , unnest(t.arr) WITH ORDINALITY x(elem, rn)
   GROUP BY rn
   ORDER BY rn
   );

See:

Postgres 9.3+

This makes use of an implicit LATERAL JOIN

SELECT ARRAY (
   SELECT sum(arr[rn])
   FROM   tbl t
        , generate_subscripts(t.arr, 1) AS rn
   GROUP  BY rn
   ORDER  BY rn
   );

See:

Postgres 9.1

SELECT ARRAY (
   SELECT sum(arr[rn])
   FROM  (
      SELECT arr, generate_subscripts(arr, 1) AS rn
      FROM   tbl t
      ) sub
   GROUP  BY rn
   ORDER  BY rn
   );

The same works in later versions, but set-returning functions in the SELECT list are not standard SQL and were frowned upon by some. Should be OK since Postgres 10, though. See:

db<>fiddle here
Old sqlfiddle

Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
4

If you need better performances and can install Postgres extensions, the agg_for_vecs C extension provides a vec_to_sum function that should meet your need. It also offers various aggregate functions like min, max, avg, and var_samp that operate on arrays instead of scalars.

sebastibe
  • 586
  • 3
  • 13
3

I know the original question and answer are pretty old, but for others who find this... The most elegant and flexible solution I've found is to create a custom aggregate function. Erwin's answer presents some great simple solutions if you only need the single resulting array, but doesn't translate to a solution that could include other table columns and aggregations, in a GROUP BY for example.

With a custom array_add function and array_sum aggregate function:

CREATE OR REPLACE FUNCTION array_add(_a numeric[], _b numeric[])
  RETURNS numeric[]
AS
$$
BEGIN
  RETURN ARRAY(
    SELECT coalesce(a, 0) + coalesce(b, 0)
    FROM unnest(_a, _b) WITH ORDINALITY AS x(a, b, n)
    ORDER BY n
  );
END
$$ LANGUAGE plpgsql;

CREATE AGGREGATE array_sum(numeric[])
(
  sfunc = array_add,
  stype = numeric[],
  initcond = '{}'
);

Then (using the names from your example):

SELECT array_sum(a) a_sums
FROM regres;

Returns your array of sums, and it can just as well be used anywhere other aggregate functions could be used, so if your table also had a column name you wanted to group by, and another array of numbers, column b:

SELECT name, array_sum(a) a_sums, array_sum(b) b_sums
FROM regres
GROUP BY name;

You won't get quite the performance you'd get out of the built-in sum function and just selecting sum(a[1]), sum(a[2]), sum(a[3]), you'd have to implement the array_add function as a compiled C function to get that. But in cases where you don't have the ability to add custom C functions (like a managed cloud database, e.g. AWS RDS), or you're not aggregating huge numbers of rows, the difference probably won't be noticed.

reads0520
  • 666
  • 8
  • 17