0

I am generating regex automatically to validate URLs. To avoid clashes I am simply escaping regular text with \Q and \E. Unfortunately I discovered that this syntax does not work as it should in PHP:

reg_match('/\Qfoo/bar\E/', 'foo/bar')
PHP Warning:  preg_match(): Unknown modifier 'b' in ... code on line ...

But it works in grep:

$ echo 'foo/bar' | grep -P '\Qfoo/bar\E'
foo/bar

And in regex101:

https://regex101.com/r/mKI0Q9/1

But not in Perl:

$ echo 'foo/bar' | perl -ne 'print $_ if m/\Qfoo/bar\E/'
Backslash found where operator expected at -e line 1, near "m/\Qfoo/bar\"

Does \Q and \E are supposed to escape the delimiter?

nowox
  • 25,978
  • 39
  • 143
  • 293
  • 2
    In PHP the delimiters are handled somewhat crudely [beforehand](https://stackoverflow.com/questions/20705399/warning-preg-replace-unknown-modifier), not by the PCRE library which knows about \Q and \E. – mario Jan 28 '19 at 16:04
  • What would be wrong with just using `[foo/bar]`? By the way, I don't see much point of surrounding that text with `\Q ... \E` anyway, because it contains no actual regex meta characters. – Tim Biegeleisen Jan 28 '19 at 16:07
  • `/` has no special meaning to `grep`, since it doesn't use delimiters. – Barmar Jan 28 '19 at 16:07
  • @TimBiegeleisen Square brackets aren't used for grouping in regexp. – Barmar Jan 28 '19 at 16:07
  • 1
    Why don't you use `preg_quote()`? – Barmar Jan 28 '19 at 16:08
  • In your source code, `\Q` and `\E` do not handle the RE delimiter. – Corion Jan 28 '19 at 16:08
  • @TimBiegeleisen Because I am not matching `foo/bar` but anything that could be a regex such as `/\r\n.*?(?\$)` – nowox Jan 28 '19 at 16:09
  • @Barmar Yes `preg_quote()` looks like a good candidate to my particular case, but it does not answer the question title. – nowox Jan 28 '19 at 16:10
  • 1
    @WiktorStribiżew That doesn't seem to be a good dup, it doesn't discuss `\Q...\E` in source code. – Barmar Jan 28 '19 at 16:12
  • The documentation says that it ignores metacharacters in the pattern. The delimiters aren't in the pattern, they're used to find the pattern in the first place. – Barmar Jan 28 '19 at 16:14
  • 1
    @Barmar: It does now (more explicitly). // As for Perl, I believe that's more of a language tokenizer issue (never looked into that though). – mario Jan 28 '19 at 16:16
  • @WiktorStribiżew There are also lots of questions that say that `.` must be escaped if you want it treated literally. That can be escaped with `\Q` and `\E`. The delimiter is different, it can't be escaped this way. – Barmar Jan 28 '19 at 16:24

1 Answers1

1

The PHP documentation is not explicit about this. What it says is:

\Q and \E can be used to ignore regexp metacharacters in the pattern. For example: \w+\Q.$.\E$ will match one or more word characters, followed by literals .$. and anchored at the end of the string.

However, the delimiters aren't "metacharacters in the pattern". They're used to determine where the pattern ends. So the order of operations apparently is:

  1. Find the pattern in the input string, looking for matching delimiters.
  2. Escape any special characters between \Q and \E within the pattern.
  3. Do the rest of regexp parsing.

Perl documentation is clearer, describing parsing of quoted constructs in general (regular expressions are just one particular form of this).

The first pass is finding the end of the quoted construct....
When searching for single-character delimiters, escaped delimiters and \ are skipped....
During this search no attention is paid to the semantics of the construct.

and elsewhere:

For the pattern of regex operators (qr//, m// and s///), the quoting from \Q is applied after interpolation is processed, but before escapes are processed.

But this is still after it has first found the end of the regexp.

Barmar
  • 741,623
  • 53
  • 500
  • 612