How to compare two arrays and pick only the non matching elements In postgres

Question

How can I pick only the non matching elements between two arrays.

Example:

base_array [12,3,5,7,8]
temp_array [3,7,8]

So here I want to compare both the arrays and remove the matching elements from the base array.

Now base_array should be like [12,5]

score 50 · Answer 1 · edited Jul 04 '15 at 18:57

50

I'd approach this with the array operator.

select array(select unnest(:arr1) except select unnest(:arr2));

If :arr1 and :arr2 don't intersect, using array_agg() leads to a null.

edited Jul 04 '15 at 18:57

seanlinsley

3,165
2
25
26

answered Jun 30 '11 at 12:55

Denis de Bernardy

75,850
13
131
154

The order of arrays produces different results here: select array(select unnest(ARRAY['1']) except select unnest(ARRAY['1','2'])) returns empty list but select array(select unnest(ARRAY['1','2']) except select unnest(ARRAY['1'])) returns {2}. – Brady Holt Oct 29 '15 at 15:15
8

@Brady: as it should, no? `{1} - {1,2} = {}`, `{1,2} - {1} = {2}`. – Denis de Bernardy Oct 30 '15 at 06:06
1

Just to highlight that this function is not stable, order in `:arr1` is not preserved. Thank for sharing this one-liner. – jlandercy Aug 11 '17 at 10:22
1

Using `explain analyse` in a loop of ~99990 iterations is nitid that this solution **is the faster** (consumes ~80% of the array_agg solution time). About resuts, is working fine. – Peter Krauss Aug 17 '18 at 22:40

score 31 · Answer 2 · answered Jun 30 '11 at 10:21

31

select array_agg(elements)
from (
  select unnest(array[12,3,5,7,8])
  except
  select unnest(array[3,7,8])
) t (elements)

answered Jun 30 '11 at 10:21

score 12 · Answer 3 · answered Sep 16 '13 at 16:08

I've constructed a set of functions to deal specifically with these types of issues: https://github.com/JDBurnZ/anyarray

The greatest thing is these functions work across ALL data-types, not JUST integers, as intarray is limited to.

After loading loading the functions defined in those SQL files from GitHub, all you'd need to do is:

SELECT
  ANYARRAY_DIFF(
    ARRAY[12, 3, 5, 7, 8],
    ARRAY[3, 7, 8]
  )

Returns something similar to: ARRAY[12, 5]

If you also need to return the values sorted:

SELECT
  ANYARRAY_SORT(
    ANYARRAY_DIFF(
      ARRAY[12, 3, 5, 7, 8],
      ARRAY[3, 7, 8]
    )
  )

Returns exactly: ARRAY[5, 12]

score 11 · Answer 4 · answered Jun 30 '11 at 19:30

Let's try the unnest() / except :

EXPLAIN ANALYZE SELECT array(select unnest(ARRAY[1,2,3,n]) EXCEPT SELECT unnest(ARRAY[2,3,4,n])) FROM generate_series( 1,10000 ) n;
 Function Scan on generate_series n  (cost=0.00..62.50 rows=1000 width=4) (actual time=1.373..140.969 rows=10000 loops=1)
   SubPlan 1
     ->  HashSetOp Except  (cost=0.00..0.05 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=10000)
           ->  Append  (cost=0.00..0.04 rows=2 width=0) (actual time=0.002..0.008 rows=8 loops=10000)
                 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..0.02 rows=1 width=0) (actual time=0.002..0.003 rows=4 loops=10000)
                       ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=4 loops=10000)
                 ->  Subquery Scan "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=0) (actual time=0.001..0.003 rows=4 loops=10000)
                       ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=4 loops=10000)
 Total runtime: 142.531 ms

And the intarray special operator :

EXPLAIN ANALYZE SELECT ARRAY[1,2,3,n] - ARRAY[2,3,4,n] FROM generate_series( 1,10000 ) n;
 Function Scan on generate_series n  (cost=0.00..15.00 rows=1000 width=4) (actual time=1.338..11.381 rows=10000 loops=1)
 Total runtime: 12.306 ms

Baseline :

EXPLAIN ANALYZE SELECT ARRAY[1,2,3,n], ARRAY[2,3,4,n] FROM generate_series( 1,10000 ) n;
 Function Scan on generate_series n  (cost=0.00..12.50 rows=1000 width=4) (actual time=1.357..7.139 rows=10000 loops=1)
 Total runtime: 8.071 ms

Time per array intersection :

intarray -           :  0.4 µs
unnest() / intersect : 13.4 µs

Of course the intarray way is much faster, but I find it amazing that postgres can zap a dependent subquery (which contains a hash and other stuff) in 13.4 µs...

MichaelG · Answer 5 · 2019-03-11T17:26:00.500

An extension to Denis' answer that returns the difference, regardless of which array was entered first. It's not the most concise query, maybe someone has a tidier way.

select array_cat(
   (select array(select unnest(a.b::int[]) except select unnest(a.c::int[]))),
   (select array(select unnest(a.c::int[]) except select unnest(a.b::int[]))))
from (select '{1,2}'::int[] b,'{1,3}'::int[] c) as a;

Returns:

{2,3}

score 1 · Answer 6 · answered Jun 30 '11 at 10:06

1

The contrib/intarray module provides this functionality--for arrays of integers, anyway. For other data types, you may have to write your own functions (or modify the ones provided with intarray).

answered Jun 30 '11 at 10:06

Jonathan Hall

75,165
16
143
189

1

Hi, intarray is good and fest... But where the `bigint` version of the project? Even a simple `count(*)` at PostgreSQL is a *bigint* datatype, not `int`. – Peter Krauss Aug 17 '18 at 22:11

score 0 · Answer 7 · edited Aug 17 '18 at 22:43

I would create a function using the same except logic as described by @a_horse_with_no_name:

CREATE FUNCTION array_subtract(a1 int[], a2 int[]) RETURNS int[] AS $$
DECLARE
    ret int[];
BEGIN
    IF a1 is null OR a2 is null THEN
        return a1;
    END IF;
    SELECT array_agg(e) INTO ret
    FROM (
        SELECT unnest(a1)
        EXCEPT
        SELECT unnest(a2)
    ) AS dt(e);
    RETURN ret;
END;
$$ language plpgsql;

Then you can use this function to change your base_array variable accordingly:

base_array := array_subtract(base_array, temp_array);

Using the @Denis's faster solution, and only SQL, we can express a generic function as

CREATE FUNCTION array_subtract(anyarray,anyarray) RETURNS anyarray AS $f$
  SELECT array(
    SELECT unnest($1)
    EXCEPT
    SELECT unnest($2)
  )
$f$ language SQL IMMUTABLE;

What does "dt" do? The one being used before the alias. I noticed if I don't use it each element of array is stored withing brackets i.e. () — Karan Parikh, Jan 02 '20 at 10:27
"dt" aliases the table, "e" aliases the column. If you don't include "dt" then it's selecting table rows as a single column. — danjuggler, Jan 06 '20 at 16:55

score 0 · Answer 8 · answered Jan 17 '23 at 08:53

Here's my proposal - it differs from the other answers in the following points:

it's a reusable function
most importantly, it keeps the original array's element order

CREATE OR REPLACE FUNCTION array_difference_ordered(array1 anyarray, array2 anyarray)
    RETURNS anyarray
    LANGUAGE sql
    IMMUTABLE AS
$$
SELECT ARRAY_AGG(array1_ordered_row.element ORDER BY array1_ordered_row.ordering)
FROM UNNEST(array1) WITH ORDINALITY AS array1_ordered_row(element, ordering)
WHERE NOT EXISTS
    (
        SELECT 1
        FROM UNNEST(array2) AS array2_row(element)
        WHERE array1_ordered_row.element = array2_row.element
    )
$$;

How to compare two arrays and pick only the non matching elements In postgres

8 Answers8

Linked