2

Note: I heard something about normal forms, data normalization etc but it is theoretical question, just for learning.

Ok, there is some playground:

--drop table if exists t;
create table t(i bigserial, x int[]);

insert into t(x)
  select
    array[(random()*100)::int,(random()*100)::int,(random()*100)::int]
  from generate_series(1,100000);
create index idx_t_x on t using gin(x);

analyze t;

The index works perfectly with array operators like:

nd@postgres=# explain analyse select * from t where array[5] <@ x;
╔═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                      QUERY PLAN                                                       ║
╠═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║ Bitmap Heap Scan on t  (cost=33.88..1004.17 rows=2823 width=41) (actual time=2.453..6.724 rows=2830 loops=1)          ║
║   Recheck Cond: ('{5}'::integer[] <@ x)                                                                               ║
║   Heap Blocks: exact=881                                                                                              ║
║   ->  Bitmap Index Scan on idx_t_x  (cost=0.00..33.17 rows=2823 width=0) (actual time=2.205..2.205 rows=2830 loops=1) ║
║         Index Cond: ('{5}'::integer[] <@ x)                                                                           ║
║ Planning time: 2.501 ms                                                                                               ║
║ Execution time: 7.616 ms                                                                                              ║
╚═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

But I interesting in any or all conditions like constant = any(field) or constant < all(field) and so on:

nd@postgres=# explain analyse select * from t where 5 > any(x);
╔════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                               QUERY PLAN                                               ║
╠════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║ Seq Scan on t  (cost=0.00..3185.00 rows=98266 width=41) (actual time=0.044..61.822 rows=12961 loops=1) ║
║   Filter: (5 > ANY (x))                                                                                ║
║   Rows Removed by Filter: 87039                                                                        ║
║ Planning time: 0.186 ms                                                                                ║
║ Execution time: 64.543 ms                                                                              ║
╚════════════════════════════════════════════════════════════════════════════════════════════════════════╝

Is there any way to create useful index for such queries?

There is some info I learned from this answer for example, but probably something changed from the 2015?

On my old-fashioned sight there is not too hard algorithmically to handle index like

    -v1
   /
row--v2
   \ ...
    -vn

so there is should be some easy and obvious solution.

Community
  • 1
  • 1
Abelisto
  • 14,826
  • 2
  • 33
  • 41

0 Answers0