2

Can I index array columns in Postgresq 9.1 to support:

select * from t where 100 < any(a) and 100 > any(a)

Where a is an integer array (with rows that have some values like {90,110}, so the query isn't returning empty.)


Details

create table t ( a integer[] )
insert into t values ('{90,110}'::integer[])

Running explain analyze on the query above yields:

Seq Scan on t  (cost=0.00..1.07 rows=2 width=32) (actual time=0.009..0.009 rows=1 loops=1)
  Filter: ((100 < ANY (a)) AND (100 > ANY (a)))
Total runtime: 0.023 ms

Background

The following question describes an approach but doesn't seem to work for non-fixed-length arrays:

Can PostgreSQL index array columns?

The Postgres docs describe the built-in GIN operators, but they don't appear to support greater than/less than operations:

As an example, the standard distribution of PostgreSQL includes GIN operator classes for one-dimensional arrays, which support indexed queries using these operators: <@, @>, =, &&

Community
  • 1
  • 1
Bosh
  • 8,138
  • 11
  • 51
  • 77
  • 1
    `EXPLAIN ANALYZE` and schema definitions please, as well as PostgreSQL version. – Craig Ringer Nov 09 '13 at 14:58
  • 1
    The question is unclear. `select 100 < '{90,110}'::int[];` yields an error. Perhaps you're actually looking for a range type? `select 100 <@ '[90,110]'::int4range;` – Denis de Bernardy Nov 09 '13 at 15:07
  • @Denis Updated the question, thanks. In general, one could find arbitrary-sized lists of ints -- not just boundaries -- in this column. And one could query on an arbitrary number of equality and/or greater/less comparisons. – Bosh Nov 09 '13 at 15:35
  • So ... you're trying to write a fast, indexable test for "any element in this array is greater than value 'a' any element (possibly a different element) is less than value 'b'" ? – Craig Ringer Nov 10 '13 at 01:24
  • If you want *one element* to meet both constraints then your query is incorrect and won't work. – Craig Ringer Nov 10 '13 at 01:25

1 Answers1

5

You could create an index on a function that returns the bounds of your column as an int4range:

create or replace function intarray2int4range(arr int[]) returns int4range as $$
  select int4range(min(val), max(val) + 1) from unnest(arr) as val;
$$ language sql immutable;

Example:

create table t (a int[]);
insert into t
select array[i - j % 5, i - j % 3, i, i + j % 3, i + j % 5]
from generate_series(0,1000) i, generate_series(0,100) j;
create index on t using gist(a);
vacuum analyze t;

Yields:

explain analyze select * from t where 20 <@ intarray2int4range(a) limit 5;
                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.53..16.55 rows=5 width=41) (actual time=0.056..0.060 rows=5 loops=1)
   ->  Index Scan using t_intarray2int4range_idx on t  (cost=0.53..1669.65 rows=521 width=41) (actual time=0.055..0.058 rows=5 loops=1)
         Index Cond: (20 <@ intarray2int4range(a))
 Total runtime: 0.095 ms
(4 rows)

It would also allow you to run similar queries that scan for ranges of values:

explain analyze select * from t where '[20,30]'::int4range && intarray2int4range(a) limit 5;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.53..11.82 rows=5 width=41) (actual time=0.125..0.130 rows=5 loops=1)
   ->  Index Scan using t_intarray2int4range_idx on t  (cost=0.53..3499.66 rows=1550 width=41) (actual time=0.123..0.126 rows=5 loops=1)
         Index Cond: ('[20,31)'::int4range && intarray2int4range(a))
 Total runtime: 0.169 ms
(4 rows)
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154