how to exclude null values in array_agg like in string_agg using postgres?

Question

If I use array_agg to collect names, I get my names separated by commas, but in case there is a null value, that null is also taken as a name in the aggregate. For example :

SELECT g.id,
       array_agg(CASE WHEN g.canonical = 'Y' THEN g.users ELSE NULL END) canonical_users,
       array_agg(CASE WHEN g.canonical = 'N' THEN g.users ELSE NULL END) non_canonical_users
FROM groups g
GROUP BY g.id;

it returns ,Larry,Phil instead of just Larry,Phil (in my 9.1.2, it shows NULL,Larry,Phil).

Instead, if I use string_agg(), it shows me only the names (without empty commas or nulls).

The problem is that I have Postgres 8.4 installed on the server, and string_agg() doesn't work there. Is there any way to make array_agg work similar to string_agg() ?

See this PostgreSQL mailing list thread on much this topic: http://postgresql.1045698.n5.nabble.com/array-agg-NULL-Handling-td2798942.html — Craig Ringer, Oct 29 '12 at 14:16
I am sorry, I don't think there is a solution in that thread.. — Daud, Oct 29 '12 at 14:33
There are two solutions in that thread. One is to create a function and the other (just suggested not shown) is the one I answered. — Clodoaldo Neto, Oct 29 '12 at 14:37
@Clodoaldo - all the rows will have canonical in ('y','n')... so the where clause seems to be redundant. The problem is that inside a grouping, if the the value of the canonical field is 'Y', and we are collecting 'N's, then a null be collected too.. — Daud, Oct 29 '12 at 14:44
@Daud That mailing list thread mainly explains that there *isn't* a really nice, simple solution and that you have to filter the elements going into `array_agg` for null if you don't want nulls. It's providing background and context, not an answer. — Craig Ringer, Oct 29 '12 at 23:43
I'm wondering if I should put together a patch for a null-ignoring `array_agg` variant, since it's so ugly to manually ignore nulls. — Craig Ringer, Oct 29 '12 at 23:45

Dale O'Brien · Answer 1 · 2022-02-19T08:25:48.667

376

With postgresql-9.3 one can do this;

SELECT g.id,
   array_remove(array_agg(CASE WHEN g.canonical = 'Y' THEN g.users ELSE NULL END), NULL) canonical_users,
   array_remove(array_agg(CASE WHEN g.canonical = 'N' THEN g.users ELSE NULL END), NULL) non_canonical_users
FROM groups g 
GROUP BY g.id;

Update: with postgresql-9.4;

SELECT g.id,
   array_agg(g.users) FILTER (WHERE g.canonical = 'Y') canonical_users,
   array_agg(g.users) FILTER (WHERE g.canonical = 'N') non_canonical_users
FROM groups g 
GROUP BY g.id;

Update (2022-02-19): also with postgresql-9.4;

This results in an empty array when all values in an array are null instead of returning null;

SELECT g.id,
  coalesce( array_agg(g.users) FILTER (WHERE g.canonical = 'Y'), '{}' ) canonical_users,
  coalesce( array_agg(g.users) FILTER (WHERE g.canonical = 'N'), '{}' ) non_canonical_users
FROM groups g 
GROUP BY g.id;

edited Feb 19 '22 at 08:25

answered Apr 14 '14 at 02:17

Dale O'Brien

4,108
2
15
10

7

This works and is fast and elegant, it solved me a problem similar to the OP's. A reason to upgrading to 9.3 for those who didn't do it yet. +1 – Pavel V. Aug 29 '14 at 08:41
16

The 9.4 is even more elegant. Works like a charm – jmgarnier Sep 08 '15 at 13:41
3

The 9.4 variant is even better, because what I need to filter away in my case is the nulls. – coladict May 12 '17 at 12:01
I used the updated version first, but then realised I needed to remove Nulls and duplicates, so went back to the first suggestion. It's a large query, but it's to create a materialized view, so not a huge issue. – Relequestual Sep 12 '18 at 11:00
4

In case it isn't obvious, for other situations you can do something like `array_agg(col_to_aggregate) FILTER (WHERE col_to_aggregate IS NOT NULL)` if you just want to apply the filter directly to that col. – Stephen Aug 25 '21 at 13:47

Alexi Theodore · Answer 2 · 2020-10-06T00:57:43.933

If you are looking for a modern answer to the general question of how to remove a NULL from an array, it is:

array_remove(your_array, NULL)

I was specifically curious about performance and wanted to compare this to the best possible alternative:

CREATE OR REPLACE FUNCTION strip_nulls(
    IN array_in ANYARRAY
)
RETURNS anyarray AS
'
SELECT
    array_agg(a)
FROM unnest(array_in) a
WHERE
    a IS NOT NULL
;
'
LANGUAGE sql
;

Doing a pgbench test proved (with high confidence) that array_remove() is a little more than twice as fast. I did my test on double precision numbers with a variety of array sizes (10, 100 and 1000 elements) and random NULLs in between.

It's also worth noting that this can be used to remove blanks ('' != NULL). But the second parameter accepts anyelement, and since it is most likely they you'd be indicating a blank with a string literal, make sure to cast it to the form you want, usually a non-array.

For example:

select array_remove(array['abc', ''], ''::text);

If you try:

select array_remove(array['abc', ''], '');

it will assume that the '' is TEXT[] (array) and will throw this error:

ERROR: malformed array literal: ""

@VivekSinha what version of postgres are you using? I just tested your query and it resulted in "{1,2,3}" for me. I am using 12.1. — Alexi Theodore, Aug 13 '20 at 02:43
Ah, I see @alexi-theodore what's happening at my end. I was using a custom+modified postgres driver. When I query directly in console, I can see the right output! Sorry about the confusion. Deleted prior comment and upvoted answer! — Vivek Sinha, Aug 14 '20 at 03:26
Probably it is helpful to note that array_remove is supported since 9.3 — Anatoly Rugalev, Oct 04 '20 at 12:48

score 38 · Accepted Answer · edited Aug 25 '21 at 13:22

38

select
    id,
    (select array_agg(a) from unnest(canonical_users) a where a is not null) canonical_users,
    (select array_agg(a) from unnest(non_canonical_users) a where a is not null) non_canonical_users
from (
    SELECT g.id,
           array_agg(CASE WHEN g.canonical = 'Y' THEN g.users ELSE NULL END) canonical_users,
           array_agg(CASE WHEN g.canonical = 'N' THEN g.users ELSE NULL END) non_canonical_users
    FROM groups g
    GROUP BY g.id
) s

Or, simpler and may be cheaper, using array_to_string which eliminates nulls:

SELECT
    g.id,
    array_to_string(
        array_agg(CASE WHEN g.canonical = 'Y' THEN g.users ELSE NULL END)
        , ','
    ) canonical_users,
    array_to_string(
        array_agg(CASE WHEN g.canonical = 'N' THEN g.users ELSE NULL END)
        , ','
    ) non_canonical_users
FROM groups g
GROUP BY g.id

edited Aug 25 '21 at 13:22

Stephen

8,508
12
56
96

answered Oct 29 '12 at 14:11

Clodoaldo Neto

118,695
26
233
260

Thanks. But if the main query (s) returns a 1000 rows, then the 2 subqueries (using unnest) will run once for each row.. Will it be better to tolerate NULLs than executing 2000 extra select queries ? – Daud Oct 29 '12 at 17:07
@Daud New version which could be cheaper. Take the explain output of both to be sure. – Clodoaldo Neto Oct 29 '12 at 17:34
3

@Clodoaldo If you're using `array_to_string(array_agg(...))` you might as well use `string_agg`. – Craig Ringer Oct 29 '12 at 23:44
1

@Craig The problem in the question is 8.4 – Clodoaldo Neto Oct 29 '12 at 23:57
@Clodoaldo Gah, old versions. Thanks. – Craig Ringer Oct 30 '12 at 00:00

score 15 · Answer 4 · edited May 23 '17 at 11:47

In solving the general question of removing nulls from array aggregates there are two main ways of attacking the problem: either doing array_agg(unnest(array_agg(x)) or creating a custom aggregate.

The first is of the form shown above:

SELECT 
    array_agg(u) 
FROM (
    SELECT 
        unnest(
            array_agg(v)
        ) as u 
    FROM 
        x
    ) un
WHERE 
    u IS NOT NULL;

The second:

/*
With reference to
http://ejrh.wordpress.com/2011/09/27/denormalisation-aggregate-function-for-postgresql/
*/
CREATE OR REPLACE FUNCTION fn_array_agg_notnull (
    a anyarray
    , b anyelement
) RETURNS ANYARRAY
AS $$
BEGIN

    IF b IS NOT NULL THEN
        a := array_append(a, b);
    END IF;

    RETURN a;

END;
$$ IMMUTABLE LANGUAGE 'plpgsql';

CREATE AGGREGATE array_agg_notnull(ANYELEMENT) (
    SFUNC = fn_array_agg_notnull,
    STYPE = ANYARRAY,
    INITCOND = '{}'
);

Calling the second is (naturally) a little nicer looking than the first:

select array_agg_notnull(v) from x;

score 9 · Answer 5 · answered Feb 25 '14 at 00:41

I am adding this even though this this thread is quite old, but I ran into this neat trick that works quite well on small arrays. It runs on Postgres 8.4+ without additional libraries or functions.

string_to_array(array_to_string(array_agg(my_column)))::int[]

The array_to_string() method actually gets rid of the nulls.

score 6 · Answer 6 · answered Feb 04 '21 at 11:11

You should wrap your array_agg with array_remove.

SELECT g.id,
       array_remove(array_agg(CASE WHEN g.canonical = 'Y' THEN g.users ELSE NULL END), NULL) canonical_users,
       array_remove(array_agg(CASE WHEN g.canonical = 'N' THEN g.users ELSE NULL END), NULL) non_canonical_users
FROM groups g
GROUP BY g.id;

score 3 · Answer 7 · answered Oct 29 '12 at 14:54

As has been suggested in the comments you can write a function to replace nulls in an array, however as also pointed out in the thread linked to in the comments, this kind of defeats the efficiency of the aggregate function if you have to create an aggregate, split it then aggregate it again.

I think keeping nulls in the array is just a (perhaps unwanted) feature of Array_Agg. You could use subqueries to avoid this:

SELECT  COALESCE(y.ID, n.ID) ID,
        y.Users,
        n.Users
FROM    (   SELECT  g.ID, ARRAY_AGG(g.Users) AS Users
            FROM    Groups g
            WHERE   g.Canonical = 'Y'
            GROUP BY g.ID
        ) y
        FULL JOIN 
        (   SELECT  g.ID, ARRAY_AGG(g.Users) AS Users
            FROM    Groups g
            WHERE   g.Canonical = 'N'
            GROUP BY g.ID
        ) n
            ON n.ID = y.ID

SQL FIDDLE

Thanks. But I needed 'case' to handle rows within a given grouping, and subqueries would be inefficient there — Daud, Oct 29 '12 at 15:15

Miklos · Answer 8 · 2019-10-29T13:12:23.033

It is very simple, just first of all create a new - (minus) operator for text[]:

CREATE OR REPLACE FUNCTION diff_elements_text
    (
        text[], text[] 
    )
RETURNS text[] as 
$$
    SELECT array_agg(DISTINCT new_arr.elem)
    FROM
        unnest($1) as new_arr(elem)
        LEFT OUTER JOIN
        unnest($2) as old_arr(elem)
        ON new_arr.elem = old_arr.elem
    WHERE old_arr.elem IS NULL
$$ LANGUAGE SQL IMMUTABLE;

CREATE OPERATOR - (
    PROCEDURE = diff_elements_text,
    leftarg = text[],
    rightarg = text[]
);

And simply subtract the array[null]:

select 
    array_agg(x)-array['']
from
    (   select 'Y' x union all
        select null union all
        select 'N' union all
        select '' 
    ) x;

That's all:

{Y, N}

`array_agg(x) FILTER (WHERE x is not null)` seems a lot easier: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=92da5321389fa18265aa415072377b98 and you don't really need your own function, you can simply use `array_remove()` https://dbfiddle.uk/?rdbms=postgres_12&fiddle=4149f45f6304623720ac51a4cefb918a — , Oct 29 '19 at 12:56

score 0 · Answer 9 · edited Dec 03 '22 at 11:14

0

I did an array_except, the concept is array_except(x,y).

array_except(array_agg(x), array_agg(case when x is null then x end))

edited Dec 03 '22 at 11:14

Tyler2P

2,324
26
22
31

answered Nov 30 '22 at 17:25

Casey

1

how to exclude null values in array_agg like in string_agg using postgres?

9 Answers9

Linked