Is there a way to usefully index a text column containing regex patterns?

Question

I'm using PostgreSQL, currently version 9.2 but I'm open to upgrading.

In one of my tables, I have a column of type text that stores regex patterns.

CREATE TABLE foo (
    id serial,
    pattern text,
    PRIMARY KEY(id)
);
CREATE INDEX foo_pattern_idx ON foo(pattern);

Then I do queries on it like this:

INSERT INTO foo (pattern) VALUES ('^abc.*$');

SELECT * FROM foo WHERE 'abc literal string' ~ pattern;

I understand that this is sort of a reverse LIKE or reverse pattern match. If it was the other, more common way, if my haystack was in the database, and my needle was anchored, I could use a btree index more or less effectively depending on the exact search pattern and data.

But the data that I have is a table of patterns and other data associated with the patterns. I need to ask the database which rows have patterns that match my query text. Is there a way to make this more efficient than a sequential scan that checks every row in my table?

score 2 · Answer 1 · edited May 23 '17 at 11:52

There is no way.

Indexes require IMMUTABLE expressions. The result of your expression depends on the input string. I don't see any other way than to evaluate the expression for every row, meaning a sequential scan.

Related answer with more details for the IMMUTABLE angle:

Does PostgreSQL support "accent insensitive" collations?

Just that there is no workaround for your case, which is impossible to index. The index needs to store constant values in its tuples, which is just not available because the resulting value for every row is computed based on the input. And you cannot transform the input without looking at the column value.

Postgres index usage is bound to operators and only indexes on expressions left of the operator can be used (due to the same logical restraints). More:

Can PostgreSQL index array columns?

Many operators define a COMMUTATOR which allows the query planner / optimizer to flip the indexed expressions to the left. Simple example: The commutator of = is =. the commutator of > is < and vice versa. The documentation:

the index-scan machinery expects to see the indexed column on the left of the operator it is given.

The regular expression match operator ~ has no commutator, again, because that's not possible. See for yourself:

SELECT oprname, oprright::regtype, oprleft::regtype, oprcom
FROM   pg_operator
WHERE  oprname = '~'
AND    'text'::regtype IN (oprright, oprleft);

 oprname | oprright |  oprleft  | oprcom
---------+----------+-----------+------------
 ~       | text     | name      | 0
 ~       | text     | text      | 0
 ~       | text     | character | 0
 ~       | text     | citext    | 0

And consult the manual here:

oprcom ... Commutator of this operator, if any
...
Unused column contain zeroes. For example, oprleft is zero for a prefix operator.

I have tried before and had to accept it's impossible on principal.

Expression indexes require an immutable expression. But indexes don't require both sides of their operator to be immutable constants, otherwise you couldn't index anything, not even `WHERE column = 'literal'`. At worse, it should be possible to create an GIN or GIST index to do this but maybe no one's done it and maybe it's hard to do. — Tim, Dec 29 '15 at 19:40
@Tim: But indexes *do* require the *left* operand of an operator to be immutable. So, no, this is *impossible* to index on principal. I added more explanation. — Erwin Brandstetter, Dec 29 '15 at 21:44
I wasn't aware of that left operand requirement. But Postgresql lets you define your own operators, there's no reason I couldn't define my own version of ~ (named something else) that flips the arguments around and use that. That would be the easy part, the hard part, I assume, would be writing a GiST index on this new operator. — Tim, Dec 29 '15 at 23:05
@Tim: But there is a *very good reason* for why there is no commutator for `~`: it's impossible. I've tried my best to convince you. I couldn't be more happy if I was wrong. But I am not. — Erwin Brandstetter, Dec 29 '15 at 23:56

Is there a way to usefully index a text column containing regex patterns?

1 Answers1

Linked