1

I'd like to use this regular expression for validating IPv6 but I want to understand everything it does https://stackoverflow.com/a/1934546/3112803

^(?>(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$/iD

but I don't know what this flag at the end does: /iD. I know the /i flag means ignore case but I can't find what D does anywhere. That answer has been upvoted a lot some I'm assuming its valid, but this post says there is no D flag: https://stackoverflow.com/a/4415233/3112803

I'm trying to use this in PL/SQL and it's not validing any valid string correctly:

if ( REGEXP_LIKE(v,'/^(?>(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$/iD') ) then
Community
  • 1
  • 1
gfrobenius
  • 3,987
  • 8
  • 34
  • 66
  • 1
    Specify what language you're using this under (the "flavor" of regex). Perl has its set of flags, as does PHP, as does PHP with preg (PCRE). – Phil Perry Feb 21 '14 at 16:57
  • @PhilPerry I update question with that info. I'm not trying to change the entire question. This one is just for finding out what that flag means. I'd do a separate question for why this isn't working in `PL/SQL`. – gfrobenius Feb 21 '14 at 17:02

2 Answers2

3

It's a flag in the PCRE flavour of Regex. See the note on the PHP.net manual page:

http://php.net/manual/en/reference.pcre.pattern.modifiers.php (under the code examples)

D (PCRE_DOLLAR_ENDONLY) - If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl.

MDEV
  • 10,730
  • 2
  • 33
  • 49
  • 1
    `There is no equivalent to this modifier in Perl.` Yet another example of how the "Perl compatible" in PCRE is untrue. – Andy Lester Feb 21 '14 at 17:11
  • 1
    Interesting ... Learn something new every day. – Quixrick Feb 21 '14 at 17:18
  • I'm not concerned with `PHP` or `Perl` for this question, but I do appreciate the info. I'm trying to get IPv6 validation working in `PL/SQL` right now. So I guess that flag doesn't apply to `PL/SQL`, correct? – gfrobenius Feb 21 '14 at 18:12
  • 1
    @gfrobenius According to [regular-expressions.info](http://www.regular-expressions.info/oracle.html) oracle only supports `i`,`c`,`n`,`m`,`x` – MDEV Feb 21 '14 at 18:16
1

D flag is only valid in PCRE. Below is quoted from PHP's documentation:

D (PCRE_DOLLAR_ENDONLY)

If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl.

Summary

This regex in PCRE flavor matches the following format:

  • IPv6: 2001:0db8:85a3:0000:0000:8a2e:0370:7334
  • IPv6 with leading 0's omitted: 2001:db8:85a3:0:0:8a2e:370:7334
  • IPv6 with longest consecutive groups of 0's (break-tie by leftmost) removed: 2001:db8:85a3::8a2e:370:7334, 2001:db8::1:0:0:1
  • IPv6 dotted-quad notation: ::ffff:192.0.2.128
  • IPv4: 192.0.2.128

Note that plain IPv4 is allowed, probably due to author's decision to support. It can be disallowed easily by removing ? where I commented below.

The regex matches all valid IPv6 according to section 2.2 of RFC 4291. However, it is not suitable for checking whether the IPv6 is in its canonical form as suggested by RFC 5952

Pattern explanation

I use the term hexa-group to refer to a 16-bit group in an IPv6 address that is written in dotted-hexadecimal notation. And deci-group to refer to an 8-bit group in an IPv4 address that is written in dotted-decimal notation.

^
(?>
  (?>
                                    # Below matches expanded IPv6
    ([a-f0-9]{1,4})                 # (Hexa-group) One to 4 hexadecimal digits
    (?>:(?1)){7}                    # Match 7 (: hexa-group)

    |                               # OR

                                    # Below matches shorthand notation :: IPv6
    (?!(?:.*[a-f0-9](?>:|$)){8,})   # Can't find 8 or more hexa-groups ahead
    ((?1)(?>:(?1)){0,6})?           # Match 0 to 7 hexa-groups, delimited by :
    ::                              # ::
    (?2)?                           # Match 0 to 7 hexa-groups, delimited by :
  )
  |
                                 # Below match IPv4 or IPv6 dotted-quad notation
  (?>                            
                                 # Below matches first 96-bit of IPv6
    (?>                          
                                 # Below matches expanded notation
      (?1)(?>:(?1)){5}:          # Match one hexa-group then 5 times (: hexa-group)

      |                          # OR

                                 # Below matches shorthand notation
      (?!(?:.*[a-f0-9]:){6,})    # Can't find 6 or more hexa-groups ahead
      (?3)?                      # Match 0 to 7 hexa-groups, delimited by :
      ::                         # ::
      (?>((?1)(?>:(?1)){0,4}):)? # Match 0 to 7 hexa-groups, delimited by :
    )?                           # Optional, so the regex can also match IPv4

                                 # Below matches IPv4 in dotted-decimal notation
    (25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # (Deci-group) One IPv4 deci-group
    (?>\.(?4)){3}                               # Match 3 (. deci-group)
  )
)
$

You may wonder why I wrote # Match 0 to 7 hexa-groups, delimited by : for the part where you match short-hand notation for a IPv6 dotted-quad notation. It is due to pattern reuse via subroutine call (?3). However, the regex is not wrong: due to the earlier look-ahead (?!(?:.*[a-f0-9]:){6,}), it is not possible to find more than 5 hexa-groups when you are matching short-hand notation for IPv6 dotted-quad notation.

Bug

By the way, there is a bug in the original regex. It fails to match ::129.144.52.38 due to the first non-backtracking group (?>pattern) disallowing backtracking, while the part of the pattern that matches IPv6 shorthand doesn't have sufficient check to make sure there is no IPv6 dotted-quad notation ahead. To put it simply: :: can be a short-hand IPv6 and can also be the prefix to a IPv6 dotted-quad notation, and without backtracking the engine fails to match ::129.144.52.38.

DEMO (Note: g and m flags are for testing purpose)

One quick way to fix is to change the first > to :. All IPv6 should be matched correctly as intended.

DEMO (Note: g and m flags are for testing purpose)

Community
  • 1
  • 1
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • Thank you, unfortunately it's not working for all my test cases, and doesn't work at all in `PL/SQL`. I have been trying to find a **IPv6 only** solution for a while and still can't find one. Here was my initial attempt to find one: **http://stackoverflow.com/questions/21631669/regular-expression-regex-for-ipv6-separate-from-ipv4**. Here are my latest test cases: **http://regex101.com/r/sV5cZ3**. Yours doesn't work for a few of the tests. I'm beginning to think my test cases are not valid. This question was more fore just figuring out the `/D` flag. The link above is for finding a pattern. – gfrobenius Feb 21 '14 at 18:36
  • @gfrobenius: The code is not even mine. It is the same as whatever you picked up from the other question. – nhahtdh Feb 21 '14 at 18:36
  • @gfrobenius: Actually, there is a slight bug in the original regex, but it should not concern you, since what you are looking for is a regex that works in PL/SQL – nhahtdh Feb 21 '14 at 18:41
  • 1
    @gfrobenius: By the way, the original regex is testing that the whole string contains only the IPv6. If you want to test multiple strings separated by new line, you need `g` and `m` flags. http://regex101.com/r/jD2zD6 – nhahtdh Feb 21 '14 at 18:42
  • Thank you so much. I awarded you the answer on my other question related to this topic, this one was more for just the `/D` and @SmokeyPHP answered that first. – gfrobenius Feb 21 '14 at 20:25
  • @gfrobenius As I mentioned _last time_, some of your "valid" test cases are not actually valid. This is why you are still having trouble. – Michael Hampton Feb 21 '14 at 22:02