How do I find a pattern like 252.63.71.62 in a text in Python with re(gex)?

Question

I have a webpage, from which I get its text using the resources module in Python. But, I'm not getting it, how to get a pattern of numbers like 126.23.73.34 from the document and extract it out using the re module?

if you want to extract IP, this could help -> http://stackoverflow.com/questions/2890896/extract-ip-address-from-an-html-string-python — Kumar Vikramjeet, May 03 '13 at 10:48

eandersson · Accepted Answer · 2013-05-03T11:02:07.150

3

You can use the regex for IPs d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}

text = "126.23.73.34";
match = re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', text)
if match:
   print "match.group(1) : ", match.group(0)

If you are looking for a complete regex to get IPv4 addresses you can find the most appropriate regex here.

To restrict all 4 numbers in the IP address to 0-255, you can use this one taken from the source above:

(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

edited May 03 '13 at 11:02

answered May 03 '13 at 10:50

eandersson

25,781
8
89
110

This is the correct regex for IPv4 addresses btw: `\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b` – tamasgal May 03 '13 at 10:57
Yep. I wasn't sure if he was looking for an IP, but I assumed as much. I'll include a link as a reference. – eandersson May 03 '13 at 10:58
I'm not sure the format of your answer result is what the OP would want, see: C:\wamp\www>Example.py ('192', '168', '0', '1') ('192', '168', '0', '254') – rcbevans May 03 '13 at 11:31
1

@o0rebelious0o try `print match.group()` – eandersson May 03 '13 at 11:45

jfs · Answer 2 · 2013-05-04T12:50:52.670

If if it is an html text; you could use an html parser (such as BeautifulSoup) to parse it, a regex to select some strings that look like an ip, and socket module to validate ips:

import re
import socket
from bs4 import BeautifulSoup # pip install beautifulsoup4

def isvalid(addr):
    try:
        socket.inet_aton(addr)
    except socket.error:
        return False
    else:
        return True

soup = BeautifulSoup(webpage)
ipre = re.compile(r"\b\d+(?:\.\d+){3}\b") # matches some ips and more
ip_addresses = [ip for ips in map(ipre.findall, soup(text=ipre))
                for ip in ips if isvalid(ip)]

Note: it extracts ips only from text e.g., it ignores ips in html attributes.

@Sazid: It is a library that you can use to extract info from HTML text. I've added link to its docs — jfs, May 04 '13 at 12:52

rcbevans · Answer 3 · 2013-05-03T11:29:52.650

0

You can use this. It will only accept VALID IP addresses:

import re
pattern = "\\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b"
text = "192.168.0.1 my other IP is 192.168.0.254 but this one isn't a real ip 555.555.555.555"
m = re.findall(pattern, text)
for i in m :
    print(i)

OUTPUT:

C:\wamp\www>Example.py
192.168.0.1
192.168.0.254

--Tested and working

edited May 03 '13 at 11:29

answered May 03 '13 at 10:58

rcbevans

7,101
4
30
46

Sure that works, but what if it is not a valid IP? e.g. 555.168.0.1? – eandersson May 03 '13 at 11:04
The question is, and I quote, "get a pattern of numbers like 126.23.73.34 from the document and extract it" It didn't say anything about actually validating the extracted values – rcbevans May 03 '13 at 11:08
That doesn't mean that other people won't look at this question a month, or year from now. It is always in the best interest of the community to provide the most complete answer possible. – eandersson May 03 '13 at 11:14

How do I find a pattern like 252.63.71.62 in a text in Python with re(gex)?

3 Answers3