-6

So I have a document (plain text) that I'm trying to extract all of the IP addresses from. I was able to extract them using regular expressions but it also grabs a large number of version numbers. I tried using string.find() but it requires that I be able to locate the escape character used for the end of the line (the IP addresses are always the last thing on a line) and the escape character used for the end of the line is unknown to me. Anyone know how I could pull these addresses out?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
user1771694
  • 3
  • 1
  • 2
  • 9
    How about posting a piece of your document and the code you've written so far? – SpankMe May 24 '13 at 20:42
  • 1
    "escape character used for the end of the line" -- do you mean the line separator, usually `\n` or `\r\n`? – Janne Karila May 24 '13 at 20:45
  • look for `re` and use this link http://answers.oreilly.com/topic/318-how-to-match-ipv4-addresses-with-regular-expressions/ – 0x90 May 24 '13 at 20:46

2 Answers2

3

If your addresses are always on the end of a line, then anchor on that:

ip_at_end = re.compile(r'(?:[0-9]{1,3}\.){3}[0-9]{1,3}$', re.MULTILINE)

This regular expression only matches dotted quads (4 sets of digits with dots in between) at the end of a line.

Demo:

>>> import re
>>> ip_at_end = re.compile(r'(?:[0-9]{1,3}\.){3}[0-9]{1,3}$', re.MULTILINE)
>>> example = '''\
... Only addresses on the end of a line match: 123.241.0.15
... Anything else doesn't: 124.76.67.3, even other addresses.
... Anything that is less than a dotted quad also fails, so 1.1.4
... does not match but 1.2.3.4
... will.
... '''
>>> ip_at_end.findall(example)
['123.241.0.15', '1.2.3.4']
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
2

Description

this will match and validate ipv4 addresses, and will ensure the individual octects are within a range of 0-255

(?:([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])

enter image description here

Disclaimer

yes I realize the OP asked for a Python solution. This PHP solution is only included to show how the expression works

php example

<?php
$sourcestring="this is a valid ip 12.34.56.78
this is not valid ip 12.34.567.89";
preg_match_all('/(?:(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])/i',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

$matches Array:
(
    [0] => Array
        (
            [0] => 12.34.56.7
        )

)
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
  • How did you generate that awesome graph? Is there a website that does that for regex input? – SethMMorton May 24 '13 at 20:58
  • @ SethMMorton. Yes, for this I'm using http://www.debuggex.com/. If you use it keep in mind that it supports javascript type expressions and doesn't understand lookbehinds. – Ro Yo Mi May 24 '13 at 21:04
  • @denomales Not sure if you've seen it since you posted this answer, but debuggex can now generate the image for you so you don't have to go through the trouble of copy/pasting/cropping :) – Sergiu Toarca May 31 '13 at 02:01
  • Excellent! You had told me that feature was on the way. It looks really good, thank you :) – Ro Yo Mi May 31 '13 at 02:54