43

How can I pick only the non matching elements between two arrays.

Example:

base_array [12,3,5,7,8]
temp_array [3,7,8]

So here I want to compare both the arrays and remove the matching elements from the base array.

Now base_array should be like [12,5]

ggvvkk
  • 431
  • 1
  • 4
  • 3

8 Answers8

50

I'd approach this with the array operator.

select array(select unnest(:arr1) except select unnest(:arr2));

If :arr1 and :arr2 don't intersect, using array_agg() leads to a null.

seanlinsley
  • 3,165
  • 2
  • 25
  • 26
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • The order of arrays produces different results here: select array(select unnest(ARRAY['1']) except select unnest(ARRAY['1','2'])) returns empty list but select array(select unnest(ARRAY['1','2']) except select unnest(ARRAY['1'])) returns {2}. – Brady Holt Oct 29 '15 at 15:15
  • 8
    @Brady: as it should, no? `{1} - {1,2} = {}`, `{1,2} - {1} = {2}`. – Denis de Bernardy Oct 30 '15 at 06:06
  • 1
    Just to highlight that this function is not stable, order in `:arr1` is not preserved. Thank for sharing this one-liner. – jlandercy Aug 11 '17 at 10:22
  • 1
    Using `explain analyse` in a loop of ~99990 iterations is nitid that this solution **is the faster** (consumes ~80% of the array_agg solution time). About resuts, is working fine. – Peter Krauss Aug 17 '18 at 22:40
31
select array_agg(elements)
from (
  select unnest(array[12,3,5,7,8])
  except
  select unnest(array[3,7,8])
) t (elements)
12

I've constructed a set of functions to deal specifically with these types of issues: https://github.com/JDBurnZ/anyarray

The greatest thing is these functions work across ALL data-types, not JUST integers, as intarray is limited to.

After loading loading the functions defined in those SQL files from GitHub, all you'd need to do is:

SELECT
  ANYARRAY_DIFF(
    ARRAY[12, 3, 5, 7, 8],
    ARRAY[3, 7, 8]
  )

Returns something similar to: ARRAY[12, 5]

If you also need to return the values sorted:

SELECT
  ANYARRAY_SORT(
    ANYARRAY_DIFF(
      ARRAY[12, 3, 5, 7, 8],
      ARRAY[3, 7, 8]
    )
  )

Returns exactly: ARRAY[5, 12]

Joshua Burns
  • 8,268
  • 4
  • 48
  • 61
11

Let's try the unnest() / except :

EXPLAIN ANALYZE SELECT array(select unnest(ARRAY[1,2,3,n]) EXCEPT SELECT unnest(ARRAY[2,3,4,n])) FROM generate_series( 1,10000 ) n;
 Function Scan on generate_series n  (cost=0.00..62.50 rows=1000 width=4) (actual time=1.373..140.969 rows=10000 loops=1)
   SubPlan 1
     ->  HashSetOp Except  (cost=0.00..0.05 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=10000)
           ->  Append  (cost=0.00..0.04 rows=2 width=0) (actual time=0.002..0.008 rows=8 loops=10000)
                 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..0.02 rows=1 width=0) (actual time=0.002..0.003 rows=4 loops=10000)
                       ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=4 loops=10000)
                 ->  Subquery Scan "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=0) (actual time=0.001..0.003 rows=4 loops=10000)
                       ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=4 loops=10000)
 Total runtime: 142.531 ms

And the intarray special operator :

EXPLAIN ANALYZE SELECT ARRAY[1,2,3,n] - ARRAY[2,3,4,n] FROM generate_series( 1,10000 ) n;
 Function Scan on generate_series n  (cost=0.00..15.00 rows=1000 width=4) (actual time=1.338..11.381 rows=10000 loops=1)
 Total runtime: 12.306 ms

Baseline :

EXPLAIN ANALYZE SELECT ARRAY[1,2,3,n], ARRAY[2,3,4,n] FROM generate_series( 1,10000 ) n;
 Function Scan on generate_series n  (cost=0.00..12.50 rows=1000 width=4) (actual time=1.357..7.139 rows=10000 loops=1)
 Total runtime: 8.071 ms

Time per array intersection :

intarray -           :  0.4 µs
unnest() / intersect : 13.4 µs

Of course the intarray way is much faster, but I find it amazing that postgres can zap a dependent subquery (which contains a hash and other stuff) in 13.4 µs...

bobflux
  • 11,123
  • 3
  • 27
  • 27
1

An extension to Denis' answer that returns the difference, regardless of which array was entered first. It's not the most concise query, maybe someone has a tidier way.

select array_cat(
   (select array(select unnest(a.b::int[]) except select unnest(a.c::int[]))),
   (select array(select unnest(a.c::int[]) except select unnest(a.b::int[]))))
from (select '{1,2}'::int[] b,'{1,3}'::int[] c) as a;

Returns:

{2,3}
MichaelG
  • 652
  • 1
  • 10
  • 17
1

The contrib/intarray module provides this functionality--for arrays of integers, anyway. For other data types, you may have to write your own functions (or modify the ones provided with intarray).

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
  • 1
    Hi, intarray is good and fest... But where the `bigint` version of the project? Even a simple `count(*)` at PostgreSQL is a *bigint* datatype, not `int`. – Peter Krauss Aug 17 '18 at 22:11
0

I would create a function using the same except logic as described by @a_horse_with_no_name:

CREATE FUNCTION array_subtract(a1 int[], a2 int[]) RETURNS int[] AS $$
DECLARE
    ret int[];
BEGIN
    IF a1 is null OR a2 is null THEN
        return a1;
    END IF;
    SELECT array_agg(e) INTO ret
    FROM (
        SELECT unnest(a1)
        EXCEPT
        SELECT unnest(a2)
    ) AS dt(e);
    RETURN ret;
END;
$$ language plpgsql;

Then you can use this function to change your base_array variable accordingly:

base_array := array_subtract(base_array, temp_array);

Using the @Denis's faster solution, and only SQL, we can express a generic function as

CREATE FUNCTION array_subtract(anyarray,anyarray) RETURNS anyarray AS $f$
  SELECT array(
    SELECT unnest($1)
    EXCEPT
    SELECT unnest($2)
  )
$f$ language SQL IMMUTABLE;
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
danjuggler
  • 1,261
  • 2
  • 21
  • 38
  • 1
    What does "dt" do? The one being used before the alias. I noticed if I don't use it each element of array is stored withing brackets i.e. () – Karan Parikh Jan 02 '20 at 10:27
  • "dt" aliases the table, "e" aliases the column. If you don't include "dt" then it's selecting table rows as a single column. – danjuggler Jan 06 '20 at 16:55
0

Here's my proposal - it differs from the other answers in the following points:

  • it's a reusable function
  • most importantly, it keeps the original array's element order
CREATE OR REPLACE FUNCTION array_difference_ordered(array1 anyarray, array2 anyarray)
    RETURNS anyarray
    LANGUAGE sql
    IMMUTABLE AS
$$
SELECT ARRAY_AGG(array1_ordered_row.element ORDER BY array1_ordered_row.ordering)
FROM UNNEST(array1) WITH ORDINALITY AS array1_ordered_row(element, ordering)
WHERE NOT EXISTS
    (
        SELECT 1
        FROM UNNEST(array2) AS array2_row(element)
        WHERE array1_ordered_row.element = array2_row.element
    )
$$;
Kamil Kalinowski
  • 436
  • 5
  • 13