Array 1 contained in array 2 and elements in the same order

Question

Does PostgreSQL have some way for me to find if an array is contained within another array, but with the same ordering?
For example, I want to know if array1 is within array2 with matching elements in the same order.

array1[1, 3, 6, 8]
array2[3, 8, 2, 9, 10, 1, 6]

Obviously not the case in the example, but is there a built-in method for this in PostgreSQL or should I create my own function?

Version of PostgreSQL is 9.6. The actual numbers the query will run on are bigints.

if you compare arrays in order, why not compare `array2::text like concat('%',array1::text,'%')` ?.. — Vao Tsun, Mar 01 '17 at 13:36
@VaoTsun, If `array2[1,4,6,10,12]` and `array1[1,6,12]` then `array2::text like concat('%',array1::text,'%')` would be false, but it should be true according to OP's requirements. However `array2::text like concat('%',array_to_string(array1, '%'),'%')` may work (I haven't tested). — JNevill, Mar 01 '17 at 13:42
yes - I did not think about possible gaps in order - true :) — Vao Tsun, Mar 01 '17 at 13:43
Ah, didn't even think about using the like operator, but that's clever. The second solution is spot on. Thanks a bunch! — Philip, Mar 01 '17 at 13:47
@Philip: the `like` solution will not work for e.g. `array[1,2]` and `array[11,2]` — , Mar 01 '17 at 14:38
For integer arrays or for other types (too)? Are duplicate elements possible? Do you use this test for queries on (big) tables? And please always declare your version of Postgres. — Erwin Brandstetter, Mar 01 '17 at 15:34
@ErwinBrandstetter It's on bigint arrays only. Duplicate elements are possible as the integers represent an intersection (road network) and they could possible drive in circles. The queries will have to sift through at least 250,000 rows, possible more. PostgreSQL version 9.6. — Philip, Mar 02 '17 at 08:22

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

General case

All elements of the second array are in the first, too. In the same order, but gaps are allowed.

I suggest this polymorphic PL/pgSQL function:

CREATE OR REPLACE FUNCTION array_contains_array_in_order(arr1 ANYARRAY
                                                       , arr2 ANYARRAY
                                                       , elem ANYELEMENT = NULL)
  RETURNS bool AS
$func$
DECLARE
   pos int := 1;
BEGIN
   FOREACH elem in ARRAY arr2
   LOOP
      pos := pos + array_position(arr1[pos:], elem);  -- see below
      IF pos IS NULL THEN
         RETURN FALSE;
      END IF;
   END LOOP; 

   RETURN true; --  all elements found in order
END
$func$ LANGUAGE plpgsql IMMUTABLE COST 3000;

As @a_horse commented, we can omit the upper bound in array subscripts to mean "unbounded" (arr1[pos:]). In older versions before 9.6 substitute with arr1[pos:2147483647] - 2147483647 = 2^31 - 1 being the theoretical max array index, the greatest signed int4 number.

This works for ...

any 1-dimensional array type, not just integer[].
arrays with NULL values, thanks to array_position() which works for NULL, too.
arrays with duplicate elements.
only for default array subscripts starting with 1. You can easily cover non-standard subscripts if needed:
Normalize array subscripts for 1-dimensional array so they start with 1

About the ANYELEMENT trick:

How do I get the type of an array's elements?

Performance

I ran a quick performance test comparing this function to the one @a_horse supplied. This one was around 5x faster.

If you use this a filter for a big table I strongly suggest you combine it with a (logically redundant) sargable filter like:

SELECT *
FROM   tbl
WHERE  arr @> '{2,160,134,58,149,111}'::int[] 
AND    array_contains_array_in_order(arr, '{2,160,134,58,149,111}')

This will use a GIN index on the array column like:

CREATE INDEX ON tbl USING gin (arr);

And only filter remaining (typically very few!) arrays that share all elements. Typically much faster.

Caveats with intarray module

Note: applies to integer[] exclusively, not smallint[] or bigint[] or any other array type!

Careful, if you have installed the intarray extension, which provides its own variant of the @> operator for int[]. Either you create an (additional) GIN index with its special operator class (which is a bit faster where applicable):

CREATE INDEX ON intarr USING gin (arr gin__int_ops);

Or, while you only have a GIN index with the default operator class, you must explicitly denote the standard operator to cooperate with the index:

WHERE  arr OPERATOR(pg_catalog.@>) '{2,160,134,58,149,111}'::int[]

Details:

GIN index on smallint[] column not used or error "operator is not unique"

Simple case

As commented, your case is simpler:

The complete second array is included in the first (same order, no gaps!).

CREATE OR REPLACE FUNCTION array_contains_array_exactly(arr1 ANYARRAY, arr2 ANYARRAY)
  RETURNS bool AS
$func$
DECLARE
   len int := array_length(arr2, 1) - 1;  -- length of arr2 - 1 to fix off-by-1
   pos int;                               -- for current search postition in arr1
BEGIN
   /*                                     -- OPTIONAL, if invalid input possible
   CASE array_length(arr1, 1) >  len      -- array_length(arr2, 1) - 1 
   WHEN TRUE  THEN                        -- valid arrays
      -- do nothing, proceed
   WHEN FALSE THEN                        -- arr1 shorter than arr2
      RETURN FALSE;                       -- or raise exception?
   ELSE                                   -- at least one array empty or NULL
      RETURN NULL;
   END CASE;
   */

   pos := array_position(arr1, arr2[1]);  -- pos of arr2's 1st elem in arr1

   WHILE pos IS NOT NULL
   LOOP
      IF arr1[pos:pos+len] = arr2 THEN    -- array slice matches arr2 *exactly*
         RETURN TRUE;                     -- arr2 is part of arr1
      END IF;

      pos := pos + array_position(arr1[(pos+1):], arr2[1]);
   END LOOP; 

   RETURN FALSE;
END
$func$ LANGUAGE plpgsql IMMUTABLE COST 1000;

Considerably faster than the above for longer arrays. All other considerations still apply.

In Postgres 9.6 `arr1[pos:2147483647]` can be replaced with `arr1[pos:]` — , Mar 01 '17 at 21:12
@a_horse_with_no_name: Yes, that's better. I integrated your improvement above, thanks. — Erwin Brandstetter, Mar 01 '17 at 23:28
This runs really fast even without the index you mention. I think I need to test some more, but this seems really promising. — Philip, Mar 02 '17 at 08:39
@Philip: You commented on the other (now deleted answer): `Ah, my question should've been more clear. I want to have the exact same sequence. So {3,1,6} and {3,8,2,9,10,1,6} would be false, however {3,1,6} and {3,8,2,9,10,3,1,6} would be true.` This solution is for the more general case. *Your* case is simpler. Please edit your question to include ***all*** relevant information. — Erwin Brandstetter, Mar 02 '17 at 15:28

Array 1 contained in array 2 and elements in the same order

1 Answers1

General case

Performance

Caveats with intarray module

Simple case