0

I just stumbled upon a curious behavior of regexp_replace PostgreSQL function. It looks like a bug but I always doubt myself first. When I run

SELECT regexp_replace(E'1%2_3', '([_%])', E'\\ \\1', 'g')

it correctly prefixes either underscore or percent with backslash+space and produces "1\ %2\ _3". However when I remove space (it doesn't have to be space, can be any character)

SELECT regexp_replace(E'1%2_3', '([_%])', E'\\\\1', 'g')

it stops using captured parenthesized expression in substitution and produces "1\12\13" instead of "1\%2\_3". I would appreciate if someone could tell me what am I doing wrong. I simply need to add backslash before certain characters in a string.

UPDATE: I was able to achieve the desired behavior by running

SELECT regexp_replace(E'1%2_3', '([_%])', E'\\\\\\1', 'g')

My original example still seems a bit illogical and inconsistent. The inconsistency is that using the same E'...' syntax 4 backslashes may produce different result.

oᴉɹǝɥɔ
  • 1,796
  • 1
  • 18
  • 31

2 Answers2

1

In your second query, after the backslash escapES are processed at the string level, you have the replacement string \\1.

What's happening is that the escaped backslash prevents \1 from being recognized as a back-reference. You need another set of backslashes, so that the replacement string is \\\1 to get a literal backslash and a back-reference. Since every literal backslash needs to be escaped, you need to double all of them.

SELECT regexp_replace(E'1%2_3', '([_%])', E'\\\\\\1', 'g')
Barmar
  • 741,623
  • 53
  • 500
  • 612
1

I would not use the outdated Posix escape syntax in Postgres without need in the first place. Are you running an outdated version with standard_conforming_strings = off? Because if you are not, simplify:

SELECT regexp_replace('1%2_3', '([_%])', '\\\1', 'g')

You only need to add a single \ to escape the special meaning of \ in the regexp pattern.

Strings prefixed with E have to be processed, which costs a tiny bit extra and there is always the risk of unintended side effects with special characters. It's pointless to write E'1%2_3' for a string you want to provide as is. Make that just '1%2_3' in any case.


And for just just two characters to replace simple use:

SELECT replace(replace('1%2_3', '_', '\_'), '%', '\%')

Regular expressions are powerful, but for a price. Even several nested simple replace() calls are cheaper than a single regexp_replace().

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228