1

I am trying to create a regex expression that match something that is not formatted as : ip|port.
A port value can be between [1, 65535].

Examples of set of data:
(1) 8.8.8.8|0 (bad: port 0 not allowed)
(2) 8.8.8.8|1 (good)
(3) 8.8.8.8|65536 (bad: port > 65535)
(4) 8.8.8.8|dawda (bad: char)

The regex expression (match bad data) should match (1), (3) and (4).

Consider that the ip part will always be right (no need to regex it) and what I need to check is the port. Because of that, I started the evaluation by the end of the line like this:

Regex to match a port between 0 and 65535:
\|(6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3})

Regex with end of line matching:
\|(6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3})$

Now, I want to negate it to catch line that dosent end with a valid port. I look in other forums (How to negate specific word in regex?, Regular Expressions and negating a whole character group) and learn about negative lookahead regex.

According to those forums and negative lookahead regex, my regex should be as:

^(?!(MY_REGEX)).*$

I modified my regex and added .* for the ip part to plug the ^.

Negative regex at end of line:
^(?!.\|(6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3})).$

The problem I have is the ending part .*$ which allow something after the port number. In the end, this code will be executed with PHP. According to PHP, variable length look-behind is not supported, which make me choose lookahead regex in first place.

Thanks for the help.

Vincent L.
  • 46
  • 6
  • 2
    I would recommend using a simple regex for basic syntax validation, and use a capturing group on the number after the the `|`. Then in your code use a method to validate the number's range. Regex doesn't do math very well. – CAustin Aug 03 '17 at 22:17

1 Answers1

1

The most appropriate way is by capturing the part after | with (.*) (any 0+ chars other than line break chars) and validating it with a bit of PHP code:

if (preg_match('~^\d+(?:\.\d+){3}\|(.*)$~', $s, $res)) {
    if (ctype_digit($res[1]) && intval($res[1]) > 0 && intval($res[1]) < 65536 ) { // valid port, omit
        echo "The port is valid: " . $res[1];
    } else {
        echo "Invalid port: " . $res[1];
    }
}

The ctype_digit checks if the string only contains digits.

See the online PHP demo.

If you just need a pattern that will be PCRE compatible, you may use the following pattern:

^\d+(?:\.\d+){3}\|(?!(?:[1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5])$)(.*)$

See the regex demo

See the details below:

  • ^ - start of string
  • \d+ - 1+ digits
  • (?:\.\d+){3} - 3 sequences of a . followed with 1+ digits (an IP pattern that needs no validation, you consider it pre-validated)
  • \| - a literal |
  • (?!(?:[1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5])$) - a negative lookahead that will fail the match if it finds the following numeric values at the end of the string:
    • [1-9]\d{0,3} - a digit from 1 to 9 and then 0 to 3 digits (1 to 9999)
    • [1-5]\d{4} - a digit from 1 to 5 and then 4 digits (10000 to 59999)
    • 6[0-4]\d{3} - 6, a digit from 0 to 4 and then 3 digits (60000 to 64999)
    • 65[0-4]\d{2} - 65, a digit from 0 to 4, and 2 digits (65000 to 65499)
    • 655[0-2]\d - 655, a digit from 0 to 2, and 1 digit (65500 to 65529)
    • 6553[0-5] - 65530 to 65535.
  • (.*) - capture the part that is not a valid port, any 0+ chars (other than line break chars) up to the end of string
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563