2

In an extended regular expression, with backrefernces, is it valid to have a backreference before the associated group?

For example, does the pattern \1(a) make sense and what does it match?

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • Did you try it? – miken32 Apr 21 '19 at 21:41
  • 1
    It doesn't make sense if it is not in a repeated group. It doesn't make sense either if it is not used within a regex engine that supports forward referencing. – revo Apr 21 '19 at 21:46
  • @miken32 - I tried it across various implementations but I didn't find any consistent pattern. – BeeOnRope Apr 21 '19 at 21:47
  • Certainly doesn't work with PCRE, and as revo said, it doesn't make any sense to write the pattern that way. – miken32 Apr 21 '19 at 21:48
  • @miken32 - agreed. The question though is what is/should be the behavior? E.g., if you are writing a regex engine, should you reject such patterns at compilation time? Should they be accepted but never match anything? Should the backref be treated as an empty string hence this pattern would match `a`? Also someone mentioned "forward references" which I know nothing about but are perhaps relevant. – BeeOnRope Apr 21 '19 at 21:53

2 Answers2

3

Regex \1(a) alone doesn't produce a match in regex flavors that support forward referencing. Why? because referred capturing group isn't yet processed. But they mean something when used in a quantified cluster e.g. (...)+. A practical usage of using forward references is an attempt for matching nested brackets.

if you are writing a regex engine, should you reject such patterns at compilation time? Should they be accepted but never match anything?

There is no absolute answer for this. JavaScript doesn't support forward references but it doesn't complain about it either. It matches a zero-length position instead. Boost engine throws an error and PCRE deals with it in another way.

Should the backref be treated as an empty string hence this pattern would match a?

It's the case with JS. In fact there is no standard defined for such behaviors. It's all engines peculiarities that someone sometime decided to implement into their own flavor.

revo
  • 47,783
  • 14
  • 74
  • 117
0

\1(a) is valid, however it may not return anything. There are statements that sometimes are good to be used for implementing a particular trick, however maybe not be used for their original purposes.

(?:\1(a)|(a))

enter image description here

This RegEx might be a rough example:

\??

enter image description here

Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    Note that the first example `(?:\1(a)|(b))` differs in the picture (it has `(?:\1(a)|(a))`). It isn't clear to me if the `\1` accomplishes anything there? Is it equivalent to `(?:(a)|(b))` (i.e., simply removing the backref)? – BeeOnRope Apr 21 '19 at 22:29