1

I need a regex that matches ips like 192.1.2.33/23 but even in case of spaces or newlines, for example:

30.0.
0.0/24

I tried this one:

\b(((\s+)?[1-9](\s+)?[0-9]?(\s+)?[0-9]?(\s+)?)\.((\s+)?[0-9](\s+)?[0-9]?(\s+)?[0-9]?(\s+)?)\.((\s+)?[0-9](\s+)?[0-9]?(\s+)?[0-9]?(\s+)?)\.((\s+)?[0-9](\s+)?[0-9]?(\s+)?[0-9]?(\s+)?)\/((\s+)?[0-9](\s+)?[0-9]?(\s+)?))\b

But doesn't work well... (also, its so damn long!)

Any help is appreciated.

EDIT:

When I try to use it with Python, sometimes it just strips off numbers when there are cases of newlines. Here is the code I use:

with open(r"AllText.txt") as fp:
for line in fp:
    for i in re.finditer(regexp_v3, line):
        print i.group()

For example try it on this text:

 "172.18.177.240/28","ewwefwfwef","172.18.176.240/28","D.edwefwefwef
e_fe","172.18.230.0/24","172.18.177.128/28","dewefgw-1.wefre_fe","172.18.176.128/28","efSwefefef.eI-nc_rwefstowefe","17
2.18.183.0/24","PAT 

EDIT 2:

The problem is "You are reading the file row by row and match your regex always only against a single row. How should the regex start matching from end of row a when it sees only row b?"

So, the question now is: how can I read all "at once" to allow the regex to see everything?

Con7e
  • 225
  • 4
  • 20

1 Answers1

1

What is not working well?

  • As first hint, you can replace (\s+)? with \s*. That is the same.

  • At the start and the end it makes also no sense to match for whitespace

With those two "improvements" you end up here:

\b(([1-9]\s*[0-9]?\s*[0-9]?\s*)\.(\s*[0-9]\s*[0-9]?\s*[0-9]?\s*)\.(\s*[0-9]\s*[0-9]?\s*[0-9]?\s*)\.(\s*[0-9]\s*[0-9]?\s*[0-9]?\s*)\/(\s*[0-9]\s*[0-9]?))\b

You can make it even shorter by using the quantifier {0,2} instead of repeating char classes

\b(([1-9](?:\s*[0-9]){0,2}\s*)\.(\s*[0-9](?:\s*[0-9]){0,2}\s*)\.(\s*[0-9](?:\s*[0-9]){0,2})\s*)\.(\s*[0-9](?:\s*[0-9]){0,2}\s*)\/((?:\s*[0-9]{1,2}))\b

it is only 4 characters shorter but also more readable IMO instead of repeating optional character classes.

stema
  • 90,351
  • 20
  • 107
  • 135
  • When I use it in python it sometimes strips off the first 2 numbers of the IPs if they are separated by newlines like in my example in the question. – Con7e Apr 17 '14 at 11:22
  • I can't see, why your regex should do this. In this case you should provide a working peace of code that reproduces the problem. – stema Apr 17 '14 at 11:27
  • 1
    OK, I think I see. You are reading the file row by row and match your regex always only against a single row. How should the regex starts matching from end of row a when it sees only row b? – stema Apr 17 '14 at 11:39
  • Nice question and I think you got where to problem is. So, how can I read all the thex "as a whole" to get the regex work? – Con7e Apr 17 '14 at 11:42
  • I am not a Python pro and don't know the "best" pythonic way to do this. An obvious solution is to read the file as you did it and instead of matching, add all the rows to a string (check if the newline is still existing!), but there may be a better solution in Python, that reads the file all at once. – stema Apr 17 '14 at 11:46
  • No prob stema, thank you for helping in discovering the problem. I'll ask a different question. I'll give you the points, since the regex works, python doesn't :P – Con7e Apr 17 '14 at 11:48
  • Have a look at existing questions, e.g. [Reading entire file in Python](http://stackoverflow.com/questions/7409780/reading-entire-file-in-python) – stema Apr 17 '14 at 11:49