3

I have this regular expression ^(?!.*?([aceg]).*?\1)(?!.*?([i])(?:.*?\2){2})[acegi]+$ which works as expected (e.g. in Ruby), but not in PostgreSQL due to "invalid backreference number".

How to solve it and keep given functionality?

Part of the SQL command: WHERE (name ~ '^(?!.*?([aceg]).*?\1)(?!.*?([i])(?:.*?\2){2})[acegi]+$')

Note: I tried escaping backslash like \\ without any error, but PG were returning invalid matches (like "aaa").

Community
  • 1
  • 1
Nikos
  • 515
  • 1
  • 4
  • 16

2 Answers2

4

The problem with Postgresql is that first, it doesn't support having capture groups within its lookahead assertions. That said, all capture groups within a lookahead will be treated as non-capture groups ((?: ... )), emphasis mine:

Lookahead constraints cannot contain back references (see Section 9.7.3.3), and all parentheses within them are considered non-capturing.[1]

So even if PostgreSQL did support having backreferences within a lookahead, it would still be failing to work as expected due to the above constraint (without a capture group, you cannot have a backreference).


A possible workaround (will be lengthy for complex requirements unfortunately) would be to count the number of each character:

WHERE
    LENGTH(REGEXP_REPLACE(name, '[^a]+', '', 'g')) < 2 AND
    LENGTH(REGEXP_REPLACE(name, '[^c]+', '', 'g')) < 2 AND
    LENGTH(REGEXP_REPLACE(name, '[^e]+', '', 'g')) < 2 AND
    LENGTH(REGEXP_REPLACE(name, '[^g]+', '', 'g')) < 2 AND
    LENGTH(REGEXP_REPLACE(name, '[^i]+', '', 'g')) < 3 AND
    LENGTH(REGEXP_REPLACE(name, '[acegi]+', '', 'g')) = 0;

[condition taken and modified from this answer; the last row is to ensure there are only those characters in the string]

Community
  • 1
  • 1
Jerry
  • 70,495
  • 13
  • 100
  • 144
  • TY, I guess it cannot be simplified? `LENGTH(REGEXP_REPLACE(name, '[^aceg]+', '', 'g')) < 2 AND` – Nikos Mar 31 '14 at 21:48
  • @Nikos I don't think so. If say, you have `name = 'aaccee'` which is valid since all the letters appear less than 2 times, and use the above, the replace will remove all except a, c, e and g letters, which means it will remove nothing. The LENGTH will return 6 (less than 2) and will be 'matched'. – Jerry Apr 01 '14 at 08:24
  • It didn't work for me with that last condition so I replaced it with regex `^[acegi]+$`. – Nikos Apr 03 '14 at 21:09
  • @Nikos I just realised. The last condition should actually be only `[acegi]+` with the same logic I mentioned in my previous comment. Sorry about that :( – Jerry Apr 04 '14 at 04:49
1

Lookahead constraints may not contain back references.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
Robert Krzyzanowski
  • 9,294
  • 28
  • 24
  • 4
    Please link to the current documentation, not an ancient version. In this case that restriction remains in place, but people often work from old documentation and make bad decisions based on it. – Craig Ringer Mar 31 '14 at 01:31