2

Hi can someone explain me why last octet of the IP if 01 or 001 is not capched by this regex ?

(\.?)([2-9][5-9][6-9]|[3-9][0-9][0-9]|0[0-9][0-9]?)($|\.)

Regular expression visualization

Debuggex Demo

as example of the code

badOctedIPv4 := "(\\.?)([2-9][5-9][6-9]|[3-9][0-9][0-9]|0[0-9][0-9]?)($|\\.)"
ipv4Format := badOctedIPv4
matchMe := regexp.MustCompile(ipv4Format)
return matchMe.FindString(input)

the input data looks like:

10.185.248.71 - - [09/Jan/2015:19:12:06 +0000] 808840 "GET /inventoryService/inventory/purchaseItem?userId=20253471&itemId=23434300 HTTP/1.1" 500 17 "-" "Apache-HttpClient/4.2.6 (java 1.5)"
[Thu Mar 13 19:04:13 2014] [error] [client 50.0.134.125] File does not exist: /var/www/favicon.ico
192.168.000.254 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 10 bad
092.168.000.254 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 9 bad
123.234.345.001 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 8 bad
123.234.145.001 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 7 bad
345.234.123.1 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 6 bad
092.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico HTTP/1.1" 404 1997 www.yahoo.com "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)..." "-" 5 bad
123.234.145.001 - - 4 bad
123.234.145.01 - - 3 bad
123.234.05.100 - - 2 bad
123.234.005.100 - - 1 bad
123.234.5.100 - - Last entry

the results returned by above code only finds all bad IP octets except the last one 001 or 01

Output of the program:

❯ go run ./findInvalidIPv4.go logfile.log
[192.168.000.254] : [.000.] : 192.168.000.254 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 10 bad
[092.168.000.254] : [ 092.] : 092.168.000.254 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 9 bad
[123.234.345.001] : [.345.] : 123.234.345.001 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 8 bad
[  345.234.123.1] : [ 345.] : 345.234.123.1 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 6 bad
[ 092.168.72.177] : [ 092.] : 092.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico HTTP/1.1" 404 1997 www.yahoo.com "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)..." "-" 5 bad
[ 123.234.05.100] : [ .05.] : 123.234.05.100 - - 2 bad
[123.234.005.100] : [.005.] : 123.234.005.100 - - 1 bad

Output explained:

  • first column [...] its the full bad IP where bad octet been found
  • second column [...] its the bad octet ... first match is enough
  • third column is the full line passed to above func

Can some one point me what I am missing and why the 001 at the end is not matching the pattern ?

Thanks

nonus25
  • 343
  • 2
  • 10
  • 2
    The regex has other problems too, check "100.100.100.100" - it will complain about all the "00" since the leading "\." is optional. Apart from that it is unclear what `input` actually contains. Is it really a string with only the IP inside? Or is it the full line, i.e. there is white space after the IP and not end of string? – Steffen Ullrich Mar 05 '22 at 20:01
  • https://stackoverflow.com/questions/5284147/validating-ipv4-addresses-with-regexp – selbie Mar 05 '22 at 20:10
  • 1
    Seems to match, whats the problem ? Just need multi-line mode flag for the `$` end of line. https://regex101.com/r/MWzN2J/1 and https://www.debuggex.com/r/jRY4QqbQqxHWMqDQ. Note that `(\.?)` at the beginning is not really what you want. Probably `(\.|^)` is better. – sln Mar 05 '22 at 21:49
  • @sin yes i agree with you `(\.|^)` is better as first group and its also fixing issue missed by me with 100.100.100.100 octet ... but if u looked at any of the above URL for debugging regex.. they are showing that ips with 123.234.145.001 123.234.145.01 should be catch it ... but seems in go they do not seems to be reflected in the results. – nonus25 Mar 06 '22 at 10:14
  • @selbie the link u provided is looking for good IPs where good octet is specified `(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])` in that group, so what i wanna achieve here is similar thing but opposite to what u have in that link ... and of course i read that before i submitted this request, thanks for the info – nonus25 Mar 06 '22 at 10:23
  • @SteffenUllrich as showed above, the input is a log file which read the text line by line and its finding the bad IP octet in the line .. so as u could see in example `input` mean line of text from the log file .... like this https://regex101.com/r/BYyn1Q/1 – nonus25 Mar 06 '22 at 10:49
  • @nonus25: *" as showed above, the input is a log file ..."* - then the string does not end at the IP address but at a space. Thus you cannot match the end with `($|\.)` but also need to take white space into account: `($|\.|\s)`. – Steffen Ullrich Mar 06 '22 at 11:28
  • What are you really trying to do? Is this an academic exercise? Because invalid IP addresses showing up in a log file is kind of a weird thing to begin with. Wouldn't it be easier to do this without a regex - especially if you are coding in Go. – selbie Mar 06 '22 at 19:20

2 Answers2

1

Your group 3 at the end:

($|\.)

Insists on either a dot or end-of-line character appearing after the last octet. That's fine for the first three octets that are guaranteed to have a . proceed it. But it won't work for the last one.

The simple fix is to just remove it or make it optional:

(\.?)([2-9][5-9][6-9]|[3-9][0-9][0-9]|0[0-9][0-9]?)($|\.?)

Add a whitespace for group 3:

(\.?)([2-9][5-9][6-9]|[3-9][0-9][0-9]|0[0-9][0-9]?)(\s|$|\.)

Or just remove it:

(\.?)([2-9][5-9][6-9]|[3-9][0-9][0-9]|0[0-9][0-9]?)

All of these have issues. So maybe this is what you really want is to match any of your 3 digit sequence with either a leading dot or a trailing dot.

\.[2-9][5-9][6-9]|\.[3-9][0-9][0-9]|\.0[0-9][0-9]|\[2-9][5-9][6-9]\.|[3-9][0-9][0-9]\.|0[0-9][0-9]\.

We start to get into regular expressions being "Write once read never again" territory.

selbie
  • 100,020
  • 15
  • 103
  • 173
0

@selbie thanks again for your help seems with all suggestions here i am getting closer to solve this, this regex (\.|^)([2-9][5-9][6-9]|[3-9][0-9][0-9]|0[0-9]+) seems its catching for me almost all what needed

[  192.168.2.001] : [ .001] : 192.168.2.001 - - [28/Jul/2006:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395
[192.168.000.254] : [ .000] : 192.168.000.254 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 10 bad
[092.168.000.254] : [  092] : 092.168.000.254 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 9 bad
[123.234.345.001] : [ .345] : 123.234.345.001 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 8 bad
[123.234.145.001] : [ .001] : 123.234.145.001 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 7 bad
[  345.234.123.1] : [  345] : 345.234.123.1 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 6 bad
[  300.234.123.1] : [  300] : 300.234.123.1 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 6 bad
[300.300.300.300] : [  300] : 300.300.300.300 - - [13/Sep/2006:07:01:51 -0700] "PROPFIND /svn/[xxxx]/[xxxx]/trunk HTTP/1.1" 401 587 6 bad
[ 092.168.72.177] : [  092] : 092.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico HTTP/1.1" 404 1997 www.yahoo.com "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)..." "-" 5 bad
[123.234.145.001] : [ .001] : 123.234.145.001 - - 4 bad
[ 123.234.145.01] : [  .01] : 123.234.145.01 - - 3 bad
[ 123.234.05.100] : [  .05] : 123.234.05.100 - - 2 bad
[123.234.005.100] : [ .005] : 123.234.005.100 - - 1 bad

and its skipping the good IP like 200.200.200.200 or 100.100.100.100 so we are getting closer to get that pattern working the only case now when i see is messed is when i have time string, 02:49:12 which starts the string 02 and so on as example:

[      127.0.0.1] : [   02] : 02:49:12 127.0.0.1 GET / 200
[      127.0.0.1] : [   02] : 02:49:35 127.0.0.1 GET /index.html 200
[      127.0.0.1] : [   03] : 03:01:06 127.0.0.1 GET /images/sponsered.gif 304
[      127.0.0.1] : [   03] : 03:52:36 127.0.0.1 GET /search.php 200
[      127.0.0.1] : [   04] : 04:17:03 127.0.0.1 GET /admin/style.css 200
[      127.0.0.1] : [   05] : 05:04:54 127.0.0.1 GET /favicon.ico 404
[      127.0.0.1] : [   05] : 05:38:07 127.0.0.1 GET /js/ads.js 200

so i am still looking for an answer what i am missing in that regular expression

================================

edit ok this seems to do the work and its able to find the bad ip octet (\.|^)([2-9][5-9][6-9]|[3-9][0-9][0-9]|0[0-9]+)([^:/-]) added the lat 3rd group ([^:/-]) to exclude any time format with two digits

nonus25
  • 343
  • 2
  • 10