1

I'm trying to write a Regex expression for selecting the valid IPv4 addresses out of a file which contains many valid, invalid(both) type of addresses. I have already written the Regex for doing that but two of invalid IPv4 addresses are still printing out - 255.255.256.255 and 8.234.88,55 Can anyone help me understanding why these two are printing out with regex that I have put.

((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){1,3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

I am using this regex to filter valid IPv4 addresses through the file containing, below listed IPv4 addresses.

12.12.12.12
127.0.0.0
255.255.256.255
344.19.0.1.
12.255.12.255
138.168.5.193
256.123.256.123
195.45.13.0
8.234.88.55
1334.0.1.234
196.83.83.191
133.133.133.133
8.234.88,55
203.26.27.38
88.173.71.66
136.186.20.9
241.92.88.103

I want to know why this regex expression is matching with 255.255.256.255 and 8.234.88,55 IPv4 addresses.

  • 1
    Do not post links or screenshots. Instead, post a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Dour High Arch Aug 15 '20 at 23:42
  • [Please don't post images of text](https://unix.meta.stackexchange.com/questions/4086/psa-please-dont-post-images-of-text?more_on=xron.net). People here regard that as evil. – Ron Aug 15 '20 at 23:43
  • 1
    I have made changes as suggested. Thanks – Gagan Ghotra Aug 15 '20 at 23:51
  • `why this regex expression is matching with 255.255.256.255 and 8.234.88,55 IPv4 addresses` It doesn't. Your method of checking if the regex matches is flawed. – KamilCuk Aug 16 '20 at 00:52
  • There are over 200+ other Q/A when searching for `regex for IPV4 addresses`. That is the purpose of maintaining a database filled with common questions and their answers ;-) ... However, glad you're getting some good feedback below. Good luck. – shellter Aug 16 '20 at 15:43

3 Answers3

1

why this regex expression is matching with 255.255.256.255 and 8.234.88,55 IPv4 addresses.

It doesn't. It matches parts of that string. Most probably you did:

$ echo '255.255.256.255' | grep -E '((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){1,3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
255.255.256.255

Yay, it works. But the pattern doesn't match the whole like, it matches parts 255.255.25 and 6.255 separately. The {1,3} allows the first part to match only once or twice, not necessarily 3 times. Like:

 ((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.)((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.)(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
   25  5                                 .   25  5                                 .                             2    5    6.255
                                                                                                                           ^^^^^ - left over

Because of the {1,3} the first part may be matched only once. Because grep applies regex to part of the string and because the full regex matched, the line is printed.

Similarly for 8.234.88,55 the part 8.234.88 is matched and ,55 is not matched. Is cool to see:

$ echo '8.234.88,55' | grep --color -E '(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){1,3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){1}'
8.234.88,55
^^^^^^^^ - is red

To match the whole line do grep -x or add anchors ^....$ or most probably you want to change {1,3} to {3} to match exactly 3 parts.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
0

((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.)

I've tried your expression in C++. Adding an extra slash before the dot solved here for the comma issue.

It parsed a comma because you are missing a slash, the way it is being written interpretes the dot as "parse any character but EOL".

Also your expression is allowing values to be prefixed by a 0 when you put [01]?

There goes a suggestion on how to tackle the expression: if the it has only one digit, how can it be written? Then 2 digits then 3...

(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\\.){3}([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])
vmp
  • 2,370
  • 1
  • 13
  • 17
  • 1
    You confuse the escaping of ``\`` required by `regex` with the escaping required by the C++ rules of writing strings as source code. You have to use double ``\`` in the C++ source code but it is, in fact, only one ``\`` in the string. The other one is required by the language. The original regular expression is correct at that point. If wrapped in apostrophes, the `regex` provided in the question can be used as-is in the command line, there is no need for extra escaping. – axiac Aug 16 '20 at 00:17
0

Your regular expression is not anchored to the beginning and end of the strings. It matches fragments of each line, not the entire line.

Put your regex between ^ and $.

^ matches the beginning of the string; $ matches the end of the string.

If multi-line matching is enabled, ^ matches the beginning of a line, $ matches the end of a line.

Also, the regex slightly incorrect and this makes it match less than it should. An IPv4 address always has 4 components. Because of {1,3}, your regex allows 2 to 4 components. Combined with the lack of anchors, it finds two matches in the lines you mentioned.

Take a look at regex101.com.

The regex should be:

^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
axiac
  • 68,258
  • 9
  • 99
  • 134