1

Please read before marking as duplicate

I have not been able to create or find a RegEx that works for all IPv6 formats (my test cases are below). I am aware of this question that everyone points to: Regular expression that matches valid IPv6 addresses However, They all combine IPv6 with IPv4 and/or do not work with all my test cases.

Requirements:

  1. I do not want it to also validate IPv4 values, I already have a separate validation function for IPv4.
  2. I need a pattern that works in Coldfusion and a pattern that works in PL/SQL.
  3. Because I'm using it in PL/SQL the pattern for it must stay under 512 characters. And Oracle supports only a narrow part of RegExp language. So the ColdFusion pattern could end up being different than the PL/SQL pattern, that is fine, so long as they both work.
  4. End result doesn't have be one long RegEx, it can be split up.

Here is the latest pattern I'm trying out:

^(?>(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$

This comes close for ColdFusion but not 100%. It doesn't work at all in PL/SQL.

Test Results http://regex101.com/r/wI8cI0 The bold items are the ones the pattern doesn't work for in ColdFusion:

  1. match
  2. match
  3. match
  4. match
  5. match
  6. match (but @Michael Hampton says this should not match because it's not a valid IPv6 address, but others have told me it is valid, so I'm not sure about this test case.)
  7. match (:: is actually a valid format, thanks @Sander Steffann.)
  8. match
  9. no match
  10. match
  11. no match
  12. no match
  13. no match
  14. match
  15. match
  16. no match
  17. no match
  18. no match
  19. no match

I got test cases 8-11 from: http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=%2Frzai2%2Frzai2ipv6addrformat.htm And was told: Test 9 and 11 are for IPv6 address prefix, not an IPv6 address, so those should not be match.

End result, I need them to work in statements like this:

ColdFusion:

<cfset IndexOfOccurrence1=REFind("^(?>(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$",value[i])>

PL/SQL:

if ( REGEXP_LIKE(v,'^(?>(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$','i') ) then
Community
  • 1
  • 1
gfrobenius
  • 3,987
  • 8
  • 34
  • 66
  • 1
    Your item 6 is _not_ valid. The regex is correct. Your test case is not a valid example of an IPv4-mapped IPv6 address. Fix the test case. – Michael Hampton Feb 08 '14 at 01:37
  • OK. Did some more research and I think item 6 isn't a valid format as you say, but these are valid formats but the RegEx says they are not: (these two formats allows IPv6 applications to communicate directly with IPv4 applications) `0:0:0:0:0:ffff:192.1.56.10` & `::ffff:192.1.56.10/96` (these next two formats are used for tunneling. it allows IPv6 nodes to communicate across an IPv4 infrastructure) `0:0:0:0:0:0:192.1.56.10` & `::192.1.56.10/96` (from: http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=%2Frzai2%2Frzai2ipv6addrformat.htm) – gfrobenius Feb 08 '14 at 15:55
  • 1
    The last two you gave in your final comment used to be valid years ago, but [are _no longer_ in use today](http://tools.ietf.org/search/rfc4291#section-2.5.5.1). The doc you linked is more than ten years old... – Michael Hampton Feb 08 '14 at 19:04
  • Don't use a regex for IPv6 parsing. It's a nightmare. – David Ehrmann Feb 21 '14 at 18:46

4 Answers4

3

With much help from @nhahtdh in this answer https://stackoverflow.com/a/21943960/3112803 I have found breaking it up to be the best solution. Below is an example of how to do it in PL/SQL, but it could be done this way in other languages. I'll do the same in ColdFusion. For PL/SQL the pattern needed to stay under 512 characters so breaking it up works great and it is simple to understand. It passed all my test cases in the original question.

if (
    /* IPv6 expanded */
    REGEXP_LIKE(v, '\A[[:xdigit:]]{1,4}(:[[:xdigit:]]{1,4}){7}\z')
    /* IPv6 shorthand */
    OR (NOT REGEXP_LIKE(v, '\A(.*?[[:xdigit:]](:|\z)){8}')
    AND REGEXP_LIKE(v, '\A([[:xdigit:]]{1,4}(:[[:xdigit:]]{1,4}){0,6})?::([[:xdigit:]]{1,4}(:[[:xdigit:]]{1,4}){0,6})?\z'))
    /* IPv6 dotted-quad notation, expanded */
    OR REGEXP_LIKE(v, '\A[[:xdigit:]]{1,4}(:[[:xdigit:]]{1,4}){5}:(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}\z')
    /* IPv6 dotted-quad notation, shorthand */
    OR (NOT REGEXP_LIKE(v, '\A(.*?[[:xdigit:]]:){6}')
    AND REGEXP_LIKE(v, '\A([[:xdigit:]]{1,4}(:[[:xdigit:]]{1,4}){0,4})?::([[:xdigit:]]{1,4}:){0,5}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}\z'))
) then
Community
  • 1
  • 1
gfrobenius
  • 3,987
  • 8
  • 34
  • 66
2

As far as I research, there is no RegEx that works for all IPv6 formats. Even there is, it is so complex and hard to maintain (not easily readable). Besides, it may cause performance problems too. Hence I have decided to write a method (function) for this. You can easily add any special cases as you wish too. I have written it in C#, but I think you can convert this algorithm to any language:

class IPv6Validator
{
    string charValidator = @"[A-Fa-f0-9]";
    string IPv4Validation = @"^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$";

    public bool IsIPv6(string maybeIPv6)
    {
        if (maybeIPv6 == "::")
        {
            return true;
        }

        int numberOfEmptyDigitGroups = 0;
        int expectedDigitGroupsLength = 8;
        string[] arrMaybeIPv6 = maybeIPv6.Split(':');

        if (arrMaybeIPv6.Length > 9 || arrMaybeIPv6.Length < 3)
        {
            return false;
        }

        for (int i = 0; i < arrMaybeIPv6.Length; i++)
        {
            //IF IPv6 starts or ends with "::" (ex ::1)
            if ((i == 0 || i == arrMaybeIPv6.Length - 2) && IsEmptyDigitGroup(arrMaybeIPv6[i]) && IsEmptyDigitGroup(arrMaybeIPv6[i+1]))
            {
                expectedDigitGroupsLength = 9;
                numberOfEmptyDigitGroups++;
                i++;
            }
            else if (arrMaybeIPv6[i].Trim() == string.Empty) //If IPv6 contains :: (ex 1:2::3)
            {
                numberOfEmptyDigitGroups++;
            }

            //Cannot have more than one "::"  (ex ::1:2::3)
            if (numberOfEmptyDigitGroups > 1)
            {
                return false;
            }

            //Mapped IPv4 control
            if (i == arrMaybeIPv6.Length - 1 && IsIPv4(arrMaybeIPv6[i]) && arrMaybeIPv6.Length < 8)
            {
                return true;
            }
            else if (i == arrMaybeIPv6.Length - 1 && HasSpecialCharInIPv6(arrMaybeIPv6[i], IsEmptyDigitGroup(arrMaybeIPv6[i - 1]))) //If last digit group contains special char (ex fe80::3%eth0)
            {
                return true;
            }
            else //if not IPV4, check the digits
            {
                //Cannot have more than 4 digits (ex 12345:1::)
                if (arrMaybeIPv6[i].Length > 4)
                {
                    return false;
                }

                //Check if it has unvalid char
                foreach (char ch in arrMaybeIPv6[i])
                {
                    if (!IsIPv6Char(ch.ToString()))
                    {
                        return false;
                    }
                }
            }

            //Checks if it has extra digit (ex 1:2:3:4:5:6:7:8f:)
            if (i >= expectedDigitGroupsLength)
            {
                return false;
            }

            //If it has missing digit at last or end (ex 1:2:3:4:5:6:7:)
            if ((i == 0 || i == arrMaybeIPv6.Length - 1) && IsEmptyDigitGroup(arrMaybeIPv6[i]) && expectedDigitGroupsLength != 9)
            {
                return false;
            }

            //If it has missing digits (ex 1:2:3:4:5:6)
            if (i == arrMaybeIPv6.Length - 1 && numberOfEmptyDigitGroups == 0 && arrMaybeIPv6.Length < 8)
            {
                return false;
            }
        }

        return true;
    }

    bool IsIPv4(string lastDigitGroup)
    {
        //If lastDigitGroup has special char, then get the first group for IPV4 validation (ex ::123.12.2.1/60)
        string maybeIPv4 = lastDigitGroup.Split('/','%')[0];

        Match match = Regex.Match(maybeIPv4, IPv4Validation);
        return match.Success;
    }

    bool IsIPv6Char(string strChar)
    {
        Match match = Regex.Match(strChar, charValidator);
        return match.Success;
    }

    bool IsSpecialChar(char ch)
    {
        if (ch == '%' || ch == '/')
        {
            return true;
        }
        return false;
    }

    bool HasSpecialCharInIPv6(string lastDigitGroup, bool isPreviousDigitGroupEmpty)
    {
        for (int i = 0; i < lastDigitGroup.Length; i++)
        {
            //If cannot find any special char at first 5 chars then leave the for loop
            if (i == 5)
                break;

            //If the first digit is special char, check the previous digits to be sure it is a valid IPv6 (ex FE80::/10)
            if (i == 0 && IsSpecialChar(lastDigitGroup[i]) && isPreviousDigitGroupEmpty)
                return true;

            if (i != 0 && IsSpecialChar(lastDigitGroup[i]))
                return true;

            if (!IsIPv6Char(lastDigitGroup[i].ToString()))
                return false;
        }
        return false;
    }

    bool IsEmptyDigitGroup(string digitGroup)
    {
        if (digitGroup.Trim() == string.Empty)
            return true;

        return false;
    }

}

I also added other methods like how to search IPv6 in text or file too. You can check: Regular expression that matches valid IPv6 addresses

Edit Summary: Ipv4 mapped and special chars have been covered like "::123.23.23.23", "fe80::3%eth0", "::ffff:192.1.56.10/96".

Community
  • 1
  • 1
Nuh Metin Güler
  • 1,075
  • 8
  • 9
  • "_As far as I research, there is no RegEx that works for all IPv6 formats._" See my answer [here](https://stackoverflow.com/a/74503707/3745413) for an IPv6 regular expression that matches all valid IPv6 address formats. – Ron Maupin Mar 17 '23 at 11:30
1

:: is a valid IPv6 address (the all-zeroes address), so why not accept it?

And if you don't want to accepts IPv6 addresses with the last 32 bits written in IPv4 notation (why wouldn't you, they are valid address representations) then just revoke the last part of the regex that deals with them (starting with ::(ffff).

Anyway, the regex does indeed contain a few errors in the IPv4-notation part. The IPv4 notation is just a different way to write the last 32 bits of the IPv6 address, and the regex doesn't handle all valid variants of that. Besides, it even forgets to escape the . so it will also accept many invalid strings.

Sander Steffann
  • 9,509
  • 35
  • 40
  • About `::`, I didn't think of that, cool, I'll accept that. About the other, I do want to accept `ABCD:ABCD:ABCD:ABCD:ABCD:ABCD:192.168.158.190` it's saying it's not valid. http://regex101.com/r/mO4hJ2 This is the RegExp I've been using for IPv4 `(((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?))` I'm not sure how to add that in to fix the IPv4 issues you mention. – gfrobenius Feb 07 '14 at 17:43
  • I don't understand, then why do you ask "I need a RegEx for IPv6 without it being combined with IPv4" ? – Sander Steffann Feb 07 '14 at 21:38
  • If you do want to accept IPv4 addresses notation then you need to change the last part of the regex so that it accepts each possible position of the double colon (the first part of the regex) with IPv4 notation at the end. I would advise to use existing address parsing routines instead of trying to put everything into one complex (and therefore error-prone) regex... – Sander Steffann Feb 07 '14 at 21:42
  • What I mean by "I need a RegEx for IPv6 without it being combined with IPv4" is: in the other post I linked to, they are all examples of validating a sole IPv4 value (eg 255.255.255.0) or an IPv6 value (eg ABCD:ABCD:ABCD:ABCD:ABCD:ABCD). I want IPv6 only, but it should allow the IPv4-mapped IPv6 format (45 bytes): `ABCD:ABCD:ABCD:ABCD:ABCD:ABCD:192.168.158.190` (http://stackoverflow.com/a/7477384/3112803). The latter is what I can seem to get working. **http://regex101.com/r/dG0cN9** Thanks, I'll keep fooling with it. – gfrobenius Feb 07 '14 at 22:45
0

This is a comprehensive IPv6 regular expression that tests all the valid IPv6 text notations (expanded, compressed, expanded-mixed, compressed-mixed) with an optional prefix length. It will also capture the various parts into capture groups. You can skip the capture groups by putting a ?: right after the opening paren for a capture group.

This is the regular expression I created and use in my IPvX IP calculator for both IPv4 and IPv6.

^# Anchor
  (# BEGIN Compressed-mixed                                         *** Group 1 ***
    (# BEGIN Hexadecimal Notation                                   *** Group 2 ***
       (?:
         (?:[0-9A-F]{1,4}:){5}[0-9A-F]{1,4}            # No ::
       | (?:[0-9A-F]{1,4}:){4}:[0-9A-F]{1,4}           # 4::1
       | (?:[0-9A-F]{1,4}:){3}(?::[0-9A-F]{1,4}){1,2}  # 3::2
       | (?:[0-9A-F]{1,4}:){2}(?::[0-9A-F]{1,4}){1,3}  # 2::3
       | [0-9A-F]{1,4}:(?::[0-9A-F]{1,4}){1,4}         # 1::4
       | (?:[0-9A-F]{1,4}:){1,5}                       # :: End
       | :(?::[0-9A-F]{1,4}){1,5}                      # :: Start
       | :                                             # :: Only
       ):
    )# END Hexadecimal Notation
    (# BEGIN Dotted-decimal Notation                                *** Group 3 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 4 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 5 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 6 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])   # 0 to 255   *** Group 7 ***
    )# END Dotted-decimal Notation
  )# END Compressed-mixed
  |
  (# BEGIN Compressed                                               *** Group 8 ***
     (?:# BEGIN Hexadecimal Notation
       (?:[0-9A-F]{1,4}:){7}[0-9A-F]{1,4}              # No ::
     | (?:[0-9A-F]{1,4}:){6}:[0-9A-F]{1,4}             # 6::1
     | (?:[0-9A-F]{1,4}:){5}(?::[0-9A-F]{1,4}){1,2}    # 5::2
     | (?:[0-9A-F]{1,4}:){4}(?::[0-9A-F]{1,4}){1,3}    # 4::3
     | (?:[0-9A-F]{1,4}:){3}(?::[0-9A-F]{1,4}){1,4}    # 3::4
     | (?:[0-9A-F]{1,4}:){2}(?::[0-9A-F]{1,4}){1,5}    # 2::5
     | [0-9A-F]{1,4}:(?::[0-9A-F]{1,4}){1,6}           # 1::6
     | (?:[0-9A-F]{1,4}:){1,7}:                        # :: End
     | :(?::[0-9A-F]{1,4}){1,7}                        # :: Start
     | ::                                              # :: Only
     )  # END Hexadecimal Notation
  )# END Compressed
  (?:# BEGIN Optional Length
       /(12[0-8]|1[0-1][0-9]|[1-9]?[0-9])              # /0 to /128 *** Group 9 ***
  )? # END Optional Length
$# Anchor

Bonus IPv4 regular expression:

^# Anchor
  (?:# BEGIN Dotted-decimal Notation
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 1 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 2 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 3 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])   # 0 to 255   *** Group 4 ***
  )  # END Dotted-decimal Notation
  (?:# BEGIN Optional Length
       /(3[0-2]|[1-2]?[0-9])                           # /0 to /32  *** Group 5 ***
  )? # END Optional Length
$# Anchor
Ron Maupin
  • 6,180
  • 4
  • 29
  • 36