0

Consider a table that stores regular expression patters.

One could query such table passing a given text for records containing patters which match given text.

This can be achieved using inverse regexp match operator ~ (by inverse I mean that text value comes first and then we specify a field containing regexp pattern like in the following example:

DROP TABLE IF EXISTS public.patterntable;
CREATE TABLE IF NOT EXISTS public.patterntable
(
    id bigint NOT NULL,
    pattern text COLLATE pg_catalog."default" NOT NULL
);

INSERT INTO patterntable (id, pattern) VALUES (1, '.*');
INSERT INTO patterntable (id, pattern) VALUES (2, '^dog');
INSERT INTO patterntable (id, pattern) VALUES (3, 'dog$');

SELECT * FROM patterntable WHERE 'x' ~ pattern;

In order to get the results the database engine runs a sequential scan which might be costly - the table can contain lots of records + many fields storing such regex patterns

My question: is there a way to index columns storing regex patterns for such lookups.

  1. in postgres (I am using last version of postgres (15.1))
  2. in any other database engine that is capable of indexing regexp patterns
Sebastian Widz
  • 1,962
  • 4
  • 26
  • 45
  • Does this answer your question? [How to create index in postgresql for regexp\_matches?](https://stackoverflow.com/questions/60409317/how-to-create-index-in-postgresql-for-regexp-matches) – Schwern Nov 26 '22 at 02:29
  • @Schwern thanks for the link, however this is not the case. The descrbed solution is for indexing text stored in a table for LIKE queries. My case is different, I have a table of patterns and query with a particular values – Sebastian Widz Nov 26 '22 at 09:09
  • pg_trgrm will work with `~`, but only with `pattern ~ 'x'`, not `'x' ~ pattern`. if your goal is to test a value against every regex in the table, perhaps you should [rethink your solution](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Perhaps a [regexp union](https://ruby-doc.org/core/Regexp.html#method-c-union)? – Schwern Nov 26 '22 at 17:50

1 Answers1

0

There is no way to index patterns like that in Postgres. And I seriously doubt there is one in any RDBMS.

An index for the reverse operation, i.e. the common case of indexing strings to be searched, is easily possible with a trigram index, at least for basic patterns. See:

Indexes only work with immutable values. That is not possible for patterns that have to be evaluated against the concrete string to determine whether it matches.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I wonder if this might be a path to follow: https://github.com/aaw/regrams and https://swtch.com/~rsc/regexp/regexp4.html i.e. by analysis of a regex pattern we could extract a list of trigrams from it, same could be done for a value used in query. Then it could be a matter of checking only records with maching trigrams. Not sure if such computation would be actually faster – Sebastian Widz Nov 26 '22 at 12:25
  • [`pg_trgm`](https://www.postgresql.org/docs/current/pgtrgm.html#id-1.11.7.44.8) will index `~`, but only `pattern ~ 'x'` not `'x' ~ pattern`. – Schwern Nov 26 '22 at 17:49
  • @Schwern: The pattern is always to the right of regex operators. And only the left side can be indexed. – Erwin Brandstetter Nov 27 '22 at 04:33
  • @ErwinBrandstetter Agreed. I phrased it confusingly. I want it to be clear to the reader of the answer that `~` can be indexed; this and others read as if it only applies to `like`. – Schwern Nov 27 '22 at 17:45