21

To forgo reading the entire problem, my basic question is:
Is there a function in PostgreSQL to escape regular expression characters in a string?

I've probed the documentation but was unable to find such a function.

Here is the full problem:

In a PostgreSQL database, I have a column with unique names in it. I also have a process which periodically inserts names into this field, and, to prevent duplicates, if it needs to enter a name that already exists, it appends a space and parentheses with a count to the end.

i.e. Name, Name (1), Name (2), Name (3), etc.

As it stands, I use the following code to find the next number to add in the series (written in plpgsql):

var_name_id := 1;

SELECT CAST(substring(a.name from E'\\((\\d+)\\)$') AS int)
INTO var_last_name_id
FROM my_table.names a
WHERE a.name LIKE var_name || ' (%)'
ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) DESC
LIMIT 1;

IF var_last_name_id IS NOT NULL THEN
    var_name_id = var_last_name_id + 1;
END IF;

var_new_name := var_name || ' (' || var_name_id || ')';

(var_name contains the name I'm trying to insert.)

This works for now, but the problem lies in the WHERE statement:

WHERE a.name LIKE var_name || ' (%)'

This check doesn't verify that the % in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.

The WHERE statement really needs to be something more like:

WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'

But var_name could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:

WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'

Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Benny
  • 1,508
  • 3
  • 18
  • 34
  • I realize this question is old, but it keeps turning up when searching for a Postgres function to escape special regexp characters. So I added an answer providing *that*. – Erwin Brandstetter Aug 17 '17 at 17:38

3 Answers3

42

To address the question at the top:

Assuming standard_conforming_strings = on, like it's default since Postgres 9.1.

Regular expression escape function

Let's start with a complete list of characters with special meaning in regular expression patterns:

!$()*+.:<=>?[\]^{|}-

Wrapped in a bracket expression most of them lose their special meaning - with a few exceptions:

  • - needs to be first or last or it signifies a range of characters.
  • ] and \ have to be escaped with \ (in the replacement, too).

After adding capturing parentheses for the back reference below we get this regexp pattern:

([!$()*+.:<=>?[\\\]^{|}-])

Using it, this function escapes all special characters with a backslash (\) - thereby removing the special meaning:

CREATE OR REPLACE FUNCTION f_regexp_escape(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT regexp_replace($1, '([!$()*+.:<=>?[\\\]^{|}-])', '\\\1', 'g')
$func$;

Add PARALLEL SAFE (because it is) in Postgres 10 or later to allow parallelism for queries using it.

Demo

SELECT f_regexp_escape('test(1) > Foo*');

Returns:

test\(1\) \> Foo\*

And while:

SELECT 'test(1) > Foo*' ~ 'test(1) > Foo*';

returns FALSE, which may come as a surprise to naive users,

SELECT 'test(1) > Foo*' ~ f_regexp_escape('test(1) > Foo*');

Returns TRUE as it should now.

LIKE escape function

For completeness, the pendant for LIKE patterns, where only three characters are special:

\%_

The manual:

The default escape character is the backslash but a different one can be selected by using the ESCAPE clause.

This function assumes the default:

CREATE OR REPLACE FUNCTION f_like_escape(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT replace(replace(replace($1
         , '\', '\\')  -- must come 1st
         , '%', '\%')
         , '_', '\_');
$func$;

We could use the more elegant regexp_replace() here, too, but for the few characters, a cascade of replace() functions is faster.

Again, PARALLEL SAFE in Postgres 10 or later.

Demo

SELECT f_like_escape('20% \ 50% low_prices');

Returns:

20\% \\ 50\% low\_prices
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 2
    Thanks for the excellent answer, also great explanation. This helped me a lot! – flinz Sep 27 '19 at 09:48
  • trying this in postgres 11 and the demo for f_regex_escape is returning the input string without characters escaped. ```SELECT regexp_replace('test(1) > Foo*', '([!$()*+.:<=>?[\\]^{|}-])', '\\1', 'g')``` returned `test(1) > Foo*` – zurbergram Mar 11 '21 at 15:07
  • 1
    @zurbergram: In your example the third `\ ` went missing in two places. (May be due to the comment display, where `\ ` needs to be escaped.) It works as advertised: *db<>fiddle [here](https://dbfiddle.uk/?rdbms=postgres_13&fiddle=5208ee7d1da0a66906c19306c81013a3)* – Erwin Brandstetter Mar 11 '21 at 16:27
  • Ugh. Awesome answer but it's sad that there's no native solution – Madbreaks May 07 '21 at 15:56
  • What about single quotes? – Nick Jan 10 '23 at 15:45
  • Also, what are the dangers of this in relation to SQL Injection? – Nick Jan 10 '23 at 16:39
1

how about trying something like this, substituting var_name for my hard-coded 'John Bernard':

create table my_table(name text primary key);
insert into my_table(name) values ('John Bernard'), 
                                  ('John Bernard (1)'), 
                                  ('John Bernard (2)'), 
                                  ('John Bernard (3)');


select max(regexp_replace(substring(name, 13), ' |\(|\)', '', 'g')::integer+1) 
from my_table 
where substring(name, 1, 12)='John Bernard' 
      and substring(name, 13)~'^ \([1-9][0-9]*\)$';

 max
-----
   4
(1 row)

one caveat: I am assuming single-user access to the database while this process is running (and so are you in your approach). If that is not the case then the max(n)+1 approach will not be a good one.

  • The database will be accessed by multiple users, but the additional filters in my query (omitted in my question) will limit it to consider rows only allowed to be used by one user at a time. Thank you for the help, and I'll accept your solution of splitting the name and using substring and char_length (literally represented by 12) to compare the 2 parts separately. – Benny Feb 28 '11 at 20:35
0

Are you at liberty to change the schema? I think the problem would go away if you could use a composite primary key:

name text not null,
number integer not null,
primary key (name, number)

It then becomes the duty of the display layer to display Fred #0 as "Fred", Fred #1 as "Fred (1)", &c.

If you like, you can create a view for this duty. Here's the data:

=> select * from foo;
  name  | number 
--------+--------
 Fred   |      0
 Fred   |      1
 Barney |      0
 Betty  |      0
 Betty  |      1
 Betty  |      2
(6 rows)

The view:

create or replace view foo_view as
select *,
case
  when number = 0 then
    name
  else
    name || ' (' || number || ')'
end as name_and_number
from foo;

And the result:

=> select * from foo_view;
  name  | number | name_and_number 
--------+--------+-----------------
 Fred   |      0 | Fred
 Fred   |      1 | Fred (1)
 Barney |      0 | Barney
 Betty  |      0 | Betty
 Betty  |      1 | Betty (1)
 Betty  |      2 | Betty (2)
(6 rows)
Wayne Conrad
  • 103,207
  • 26
  • 155
  • 191
  • I considered this solution, but since inserting the data is really the only time the problem exists, the bulk of the work would be making all the display query changes. It just seems like modifying the insertion query would be a lot easier – Benny Feb 28 '11 at 20:38
  • @Benny, You could use a view for that, if you like. Please see the amended answer. – Wayne Conrad Feb 28 '11 at 20:52