2

What's the proper way to add a literal text value from a field to a regex in postgres?

For example, something like this where some_field could contain invalid regex syntax if left unescaped:

where some_text ~* ('\m' || some_field || '\M');
jadams
  • 23
  • 2

1 Answers1

1

The easiest thing to do is to use a regex to prep your string to be in a regex. Escaping non-word characters in your string should be sufficient to make it regex-safe, for example:

=> select regexp_replace('. word * and µ{', E'([^\\w\\s])', E'\\\\\\1', 'g');
   regexp_replace   
--------------------
 \. word \* and µ\{

So something like this should work in general:

where some_text ~* x || regexp_replace(some_field, E'([^\\w\\s])', E'\\\\\\1', 'g') || y

where x and y are the other parts of the regex.

If you didn't need a regex at the end (i.e. no y above), then you could use (?q):

An ARE can begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE.

and a q means that the:

rest of RE is a literal ("quoted") string, all ordinary characters

So you could use:

where some_text ~* x || '(?q)' || some_field

in this limited case.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • @mu is there any other way that would work without escaping all non-word characters? I have a similar problem but my strings contain a lot of non-word characters that I fear will have strange behaviors if escaped. – João Almeida Sep 10 '15 at 13:47
  • @JoãoAlmeida: PostgreSQL uses pretty strict [POSIX regexes](http://www.postgresql.org/docs/current/interactive/functions-matching.html#FUNCTIONS-POSIX-REGEXP) so I think escaping `[^\w]` is pretty safe, you're not going to produce anything regex-ish accidentally and you don't have to worry about `(` vs `\(` as with some regexes. The `(?q)` approach might be more suitable for you though. – mu is too short Sep 10 '15 at 18:13
  • @muistooshort thanks. The `(?q)` approach would be perfect for me if I didn't need to insert a regex after my string. – João Almeida Sep 11 '15 at 09:20