0

I have the following string:

text = '10.0.0.1.1 but 127.0.0.256 1.1.1.1'

and I want to return the valid IP addresses, so it should only return 1.1.1.1 here since 256 is higher than 255 and the first IP has too many numbers.

so far I have the following but it doesn't work on the 0-255 requirement.

text = "10.0.0.1.1 but 127.0.0.256 1.1.1.1"
l = []
import re
for word in text.split(" "):
    if word.count(".") == 3:
        l = re.findall(r"[\d{1,3}]+\.[\d{1,3}]+\.[\d{1,3}]+\.[\d{1,3}]+",word)
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
RustyShackleford
  • 25,262
  • 6
  • 22
  • 38
  • why not just [use google](http://answers.oreilly.com/topic/318-how-to-match-ipv4-addresses-with-regular-expressions)? – tenub Jan 10 '14 at 15:26
  • can also try [this one](http://www.mkyong.com/regular-expressions/how-to-validate-ip-address-with-regular-expression). – tenub Jan 10 '14 at 15:32
  • May be useful: http://stackoverflow.com/questions/11264005/using-a-regex-to-match-ip-addresses-in-python – Captain Caveman Jan 10 '14 at 15:34
  • why not just use ipaddress ? http://docs.python.org/3/howto/ipaddress.html There is a port for python2.x on pypi – gawel Jan 10 '14 at 15:34

1 Answers1

2

Here is a python regex that does a pretty good job of fetching valid IPv4 IP addresses from a string:

import re
reValidIPv4 = re.compile(r"""
    # Match a valid IPv4 in the wild.
    (?:                                         # Group two start-of-IP assertions.
      ^                                         # Either the start of a line,
    | (?<=\s)                                   # or preceeded by whitespace.
    )                                           # Group two start-of-IP assertions.
    (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)    # First number in range 0-255 
    (?:                                         # Exactly 3 additional numbers.
      \.                                        # Numbers separated by dot.
      (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)  # Number in range 0-255 .
    ){3}                                        # Exactly 3 additional numbers.
    (?=$|\s)                                    # End IP on whitespace or EOL.
    """, re.VERBOSE | re.MULTILINE)

text = "10.0.0.1.1 but 127.0.0.256 1.1.1.1"
l = reValidIPv4.findall(text)
print(l)
ridgerunner
  • 33,777
  • 5
  • 57
  • 69
  • I'm confused by your comments, does it start on group two or one? I see that comment listed twice and I'm trying to understand more. – RustyShackleford Jan 10 '14 at 18:50
  • @wannabe_n00b - I can see why you were confused - poor wording on my part. There are actually no capture groups in this regex. The first (non-capturing) group is: _"grouping two alternatives, each of which is an assertion"_ I always repeat the comment at the close of each group to associate the start and the end of the group comment-wise. – ridgerunner Jan 10 '14 at 19:47
  • what would the effect be if I changed your code to [01]?[0-9]?[0-9]? It seems like it would be better? – RustyShackleford Apr 22 '14 at 15:57
  • @wannabe_n00b - The expression: `[01]?[0-9]?[0-9]?` matches an empty string (i.e. this matches every position in every string that has ever existed). This won't work because there needs to be at least one digit in each of the 4 IPv4 dotted quad positions. – ridgerunner Apr 22 '14 at 18:14
  • If you had an IP that was "0.0.0.1", How would this IP be evaluated against `[01]?[0-9][0-9]?` Wouldn't the `[01]?` pick up the numbers but then fail on the manditory `[0-9]` – RustyShackleford Apr 22 '14 at 18:25
  • @wannabe_n00b - Yes, but since the previous `[01]?` match was optional, the regex engine will backtrack and then the required `[0-9]` can then match. Note that this regex was taken from the book: [Mastering Regular Expressions (3rd Edition)](http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124 "By Jeffrey Friedl. Best book on Regex - ever!"). This expression may seem a bit peculiar, but is quite efficient at matching other IPv4 possibile forms such as: `0.00.000.001` – ridgerunner Apr 23 '14 at 18:13
  • so if the expression read `[01]?[4-5][4-5]?` instead, would the optional `[01]` then be matched with 0 and 1? Basically it skips the option and returns to the option if necessary? – RustyShackleford Apr 24 '14 at 16:21