How to search a document for IP addresses

Question

So I have a document (plain text) that I'm trying to extract all of the IP addresses from. I was able to extract them using regular expressions but it also grabs a large number of version numbers. I tried using string.find() but it requires that I be able to locate the escape character used for the end of the line (the IP addresses are always the last thing on a line) and the escape character used for the end of the line is unknown to me. Anyone know how I could pull these addresses out?

How about posting a piece of your document and the code you've written so far? — SpankMe, May 24 '13 at 20:42
"escape character used for the end of the line" -- do you mean the line separator, usually `\n` or `\r\n`? — Janne Karila, May 24 '13 at 20:45
look for `re` and use this link http://answers.oreilly.com/topic/318-how-to-match-ipv4-addresses-with-regular-expressions/ — 0x90, May 24 '13 at 20:46

score 3 · Accepted Answer · answered May 24 '13 at 20:44

3

If your addresses are always on the end of a line, then anchor on that:

ip_at_end = re.compile(r'(?:[0-9]{1,3}\.){3}[0-9]{1,3}$', re.MULTILINE)

This regular expression only matches dotted quads (4 sets of digits with dots in between) at the end of a line.

Demo:

>>> import re
>>> ip_at_end = re.compile(r'(?:[0-9]{1,3}\.){3}[0-9]{1,3}$', re.MULTILINE)
>>> example = '''\
... Only addresses on the end of a line match: 123.241.0.15
... Anything else doesn't: 124.76.67.3, even other addresses.
... Anything that is less than a dotted quad also fails, so 1.1.4
... does not match but 1.2.3.4
... will.
... '''
>>> ip_at_end.findall(example)
['123.241.0.15', '1.2.3.4']

answered May 24 '13 at 20:44

Martijn Pieters

1,048,767
296
4,058
3,343

what about ipv6 ? http://stackoverflow.com/a/6276240/1031417 – 0x90 May 24 '13 at 20:49
0x90: I was assuming IPv4 because the OP was claiming version numbers were interfering; IPv6 formatted IP addresses, using `:` as a delimiter, rarely are mistaken for software versions.. – Martijn Pieters May 24 '13 at 20:51
The OP is also tagged as ipv4. – Ro Yo Mi May 24 '13 at 21:06
Even better, not sure how I missed that. :-P – Martijn Pieters May 24 '13 at 21:07

Ro Yo Mi · Answer 2 · 2013-05-24T21:02:05.280

2

Description

this will match and validate ipv4 addresses, and will ensure the individual octects are within a range of 0-255

(?:([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])

enter image description here

Disclaimer

yes I realize the OP asked for a Python solution. This PHP solution is only included to show how the expression works

php example

<?php
$sourcestring="this is a valid ip 12.34.56.78
this is not valid ip 12.34.567.89";
preg_match_all('/(?:(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])/i',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

$matches Array:
(
    [0] => Array
        (
            [0] => 12.34.56.7
        )

)

edited May 24 '13 at 21:02

answered May 24 '13 at 20:56

Ro Yo Mi

14,790
5
35
43

How did you generate that awesome graph? Is there a website that does that for regex input? – SethMMorton May 24 '13 at 20:58
@ SethMMorton. Yes, for this I'm using http://www.debuggex.com/. If you use it keep in mind that it supports javascript type expressions and doesn't understand lookbehinds. – Ro Yo Mi May 24 '13 at 21:04
@denomales Not sure if you've seen it since you posted this answer, but debuggex can now generate the image for you so you don't have to go through the trouble of copy/pasting/cropping :) – Sergiu Toarca May 31 '13 at 02:01
Excellent! You had told me that feature was on the way. It looks really good, thank you :) – Ro Yo Mi May 31 '13 at 02:54

How to search a document for IP addresses

2 Answers2

Description

Disclaimer

php example