3

I am trying to validate an IPv4 address using the RegEx below

^((?|([0-9][0-9]?)|(1[0-9][0-9])|(2[0-5][0-5]))\.){3}(?2)$

The regex works fine until the 3rd octet of the IP address in most of the cases. But sometimes in the last octet, it only matches the first alternative in the Branch Reset Group and ignores the other alternating groups altogether. I know that all alternatives in a branch reset group refer to the same capturing group. I tried the suggestion to reuse the capture groups as described in this StackOverflow post. It worked partially.

RegEx match results

Sandeep Gusain
  • 127
  • 1
  • 1
  • 10

2 Answers2

2

There is an explanation about this behaviour on this page:

https://www.pcre.org/original/doc/html/pcrepattern.html#SEC15

The documentation states:

a subroutine call to a numbered subpattern always refers to the first one in the pattern with the given number.

Using the example on that page:

(?|(abc)|(def))(?1)

Inside a (?| group, parentheses are numbered as usual, but the number is reset at the start of each branch.

The numbers will look like this

(?|(abc)|(def))
   1     1

This will match

abcabc
defabc
abcabc

But it does not match

defdef

It does not match defdef because the pattern will match the first def, but the following (?1) will only match the first numbered subpattern which is (abc)

See a regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    It makes a lot more sense to me now. Thanks a lot for the example supported explanation. I just came to know that the default python regex module doesn't support numbered subroutine calls for now. Sorry for diverting off-topic – Sandeep Gusain Feb 18 '21 at 11:53
  • 1
    @SandeepGusain No problem, you are welcome. Good luck! – The fourth bird Feb 18 '21 at 11:54
1

The reason is that (?2) regex subroutine recurses the first capturing group pattern with the ID 2, ([0-9][0-9]?). If it fails to match (the $ requires the end of string right after it), backtracking starts and the match is eventually failed.

The correct approach to recurse a group of patterns is to avoid using a branch reset group and capture all alternatives into a single capturing group that will be recursed:

^(?:(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?1)$
//  |____________ Group 1 _______________|        \_ Regex subroutine

See the regex demo.

Note the octet pattern is a bit different, it is taken from How to Find or Validate an IP Address. Your octet pattern is wrong because 2[0-5][0-5] does not match numbers between 200 and 255 that end with 6, 7, 8 and 9.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Yes, I messed up with the octet pattern. Thanks a lot for pointing it out and such a detailed explanation Wiktor! It works as expected – Sandeep Gusain Feb 18 '21 at 11:27