Am I using regular expressions incorectly on the database?

Question

This works:
SELECT * FROM aTable WHERE junk='foo bar';
This hangs for ever:
SELECT * FROM aTable WHERE junk~'foo bar';
This hangs for ever:
SELECT * FROM aTable WHERE junk~'foo b.*';
What is wrong?

EDIT: is there ever a situation where the sql query is submitted but no reply is ever given? For example infinite loops, bad syntax etc.

Bad syntax will produce an immediate reply with an error. It's certainly possible to have queries that never return, though, like incorrectly written recursive CTEs (`WITH RECURSIVE` queries). Most bad queries will eventually fail when the system runs out of resources, but it's possible to write ones that'll run forever. A query blocked on a lock that's held by an idle transaction will hang indefinitely (or until the idle transaction terminates), too. BTW, did you examine `explain` output for each statement as I advised so you could see what was going on? Worth doing. — Craig Ringer, Sep 17 '12 at 04:22
If you are looking for left-anchored search terms, be sure to express it. Will be *much* faster: `junk ~ '^foo bar'`, and can be supported with an index. Generally, use `LIKE` if you can. Faster, yet. More under this [related question on dba.SE](http://dba.stackexchange.com/q/10694/3684) — Erwin Brandstetter, Sep 17 '12 at 13:21

score 2 · Accepted Answer · answered Sep 17 '12 at 02:56

2

There is nothing wrong.

Neither of your 2 queries that use regular expressions can be optimized using indexes.

So you have full scan as a result, thus it takes some time.

answered Sep 17 '12 at 02:56

zerkms

249,484
69
436
539

score 2 · Answer 2 · edited May 23 '17 at 10:25

All three are correct, but if you examine the query plans using EXPLAIN you'll see that the plans are very different.

See this SQLFiddle - note the differing execution times. Examine the query plans with "view execution plan".

In your case, the = is probably using a b-tree index lookup for a single value, which should be very fast, while the ~ versions are probably doing sequential scans - and quite CPU-expensive ones at that, because it has to try the regex against every single row.

Handily I just wrote about this in answer to another post. See this answer which explains that LIKE and SIMILAR TO can use a suitably-created index for a prefix match like you're doing, but ~ cannot.

Try creating an index like CREATE INDEX atable_junk_txt_idx ON aTable(junk text_pattern_ops) then using LIKE 'foo b%'.

Note that every extra index will slow down every insert, update and delete, so don't create indexes you don't have to.

See Using explain on the Pg wiki.

Am I using regular expressions incorectly on the database?

2 Answers2