3

I'm wondering if it's possible to compare values in regexps with the regexp system in Python. Matching the pattern of an IP is easy, but each 1-3 digits cannot be above 255 and that's where I'm a bit stumped.

dutt
  • 7,909
  • 11
  • 52
  • 85

8 Answers8

9

No need for regular expressions here. Some background:

>>> import socket
>>> socket.inet_aton('255.255.255.255')
'\xff\xff\xff\xff'
>>> socket.inet_aton('255.255.255.256')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
error: illegal IP address string passed to inet_aton
>>> socket.inet_aton('my name is nobody')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
error: illegal IP address string passed to inet_aton

So:

import socket

def ip_address_is_valid(address):
    try: socket.inet_aton(address)
    except socket.error: return False
    else: return True

Note that addresses like '127.1' could be acceptable on your machine (there are systems, including MS Windows and Linux, where missing octets are interpreted as zero, so '127.1' is equivalent to '127.0.0.1', and '10.1.4' is equivalent to '10.1.0.4'). Should you require that there are always 4 octets, change the last line from:

else: return True

into:

else: return address.count('.') == 3
tzot
  • 92,761
  • 29
  • 141
  • 204
  • I thought that zero-filled was only the case with IPv6 addresses and the :: notation? It is frequently the case that when writing IP addresses with a CIDR mask that only the octets touched by the mask are written (10/8, 172.16/12, 192.168/16, 192.168.127/24), with a "zero-fill" on the remaining octets. But eliding zero-octets in the middle I've never seen with IPv4. – Vatine Oct 25 '10 at 08:17
  • @Vatine: I assume you: a) downvoted my answer because you thought I was stating something mistaken, while I was stating something you didn't know b) you've never pinged 127.1 yourself; please, do try it. Not only I am mistaken, but the creator of your IP stack is mistaken too. We apologize. – tzot Oct 25 '10 at 08:32
  • @ΤΖΩΤΖΙΟΥ: My ping expands 127.1 into 127.0.0.1 but it was concocted in the Dark Tower of Redmond where adherence to standards is sometimes a little casual so that proves nuffin :-) – John Machin Oct 25 '10 at 08:55
  • @John: of course it proves nuffin :) *My* ping —which was concocted in the Evil Forces of \*nix— together with *your* ping just assert that 127.1 is *acceptable* … – tzot Oct 25 '10 at 09:34
  • Just because the implied empty fields are accepted by some systems does not mean they are correct. That is like saying a single 32bit integer, expressed in decimal is valid, or that hex should be accepted (because some interfaces accept hex for the quads values. – benc Oct 25 '10 at 15:08
  • @benc: no-one said “correct”; I said “acceptable”. I don't know which systems **don't** accept '127.1'; your “some known” might be “all known systems” for what it's worth. The question was about the validity of an IPv4 address, and the *validity* is defined by the software that uses the IPv4 address (whether it follows standards or not). If the only application mentioned is Python, then I report what is valid for Python, at least. Python **and** GNU ping **and** Windows ping **and** Firefox accept '127.0.0.1', '127.1', '2130706433', '0x7f000001' as equivalent. What was your point exactly? – tzot Oct 25 '10 at 15:41
  • @benc: I'm asking what was your point, because I don't see one, since my answer already has a clause to verify that the input `address` is a dotted address with four octets, as the question implies that it's needed. – tzot Oct 25 '10 at 15:44
  • Just tried pinging it, this is what the prompt came back with: "Name (127.1) is not a valid IP Address." – Vatine Oct 25 '10 at 15:56
  • In most situations, people ask the for an implementation of an IPv4 address validator in a specific language. They rarely ask: "What kinds of address magic works in this particular language? – benc Oct 25 '10 at 16:12
  • ΤΖΩΤΖΙΟΥ: Also, I should mention that often this magic is platform specific, so it might work in one place and not another. And the data might flow to other places that don't work. 2130706433 might work in Firefox, but fail in a proxy it points to. – benc Oct 25 '10 at 16:21
  • @ΤΖΩΤΖΙΟΥ: So far, I have tested it in IOS and ExtremeOS, neither understand 127.1 as a valid IP address and recommending non-universal parsing results is Not Recommended. – Vatine Oct 25 '10 at 16:26
  • @Vatine: so I assume you're a Mac OS/X user: it's a fact that, although OS/X is based on BSD, the code for parsing IP addresses has been rewritten, so maybe even Python on your system won't accept '127.1'. If you're not solely using Mac OS/X (especially since you've been a netadm), I'm interested in other systems too. However, I'd like to state here what irritated me most about your response: no-one knows everything, I'm sure you'll agree; given that, I strongly believe that dismissing something outside one's experience as false is arrogant and narrow-minded. – tzot Oct 25 '10 at 16:27
  • @Vatine: oh, come on! Are you intentionally putting words in my mouth? I did **not** recommend non-universal parsing results, and I never said it's **correct** (@benc); I said “acceptable” because Python (in the scope of the question) accepts alternative IPv4 addresses on MS Windows and on Linux, and that is a **fact**. And my answer **already** says: “should you require that there are always four octets”, so I got you covered. How is that answer not useful or incorrect? – tzot Oct 25 '10 at 16:32
  • I tried to get something of value out of this otherwise pointless discussion, and modified the text of my answer to make it unambiguous. – tzot Oct 25 '10 at 16:38
  • @benc: your objections would be valid if I *suggested* that one writes '127.1' instead of '127.0.0.1'. I don't know the scope and the intended use of IPv4 addresses by the OP, I only knew Python. Although my initial answer was correct, it was **not** unambiguous; I modified the text of the answer accordingly to acknowledge that fact. I welcome any further objections to what I *wrote*, but I can't respond to any objections to what one may *think* that I *meant*. – tzot Oct 25 '10 at 16:54
6

You need to check the allowed numbers in each position. For the first optional digit, acceptable values are 0-2. For the second, 0-5 (if the first digit for that part is present, otherwise 0-9), and 0-9 for the third.

I found this annotated example at http://www.regular-expressions.info/regexbuddy/ipaccurate.html :

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
JAL
  • 21,295
  • 1
  • 48
  • 66
  • I thought about doing something like that, very neat with an explanation though :) Thanks. – dutt Oct 25 '10 at 04:23
  • @Glenn Maynard if you already have the IP and need to check if it's valid, I wouldn't use a regex. If you need to extract potential IPs from blocks of other text, a regex would be useful. – JAL Oct 25 '10 at 07:28
  • It's a little bit of both, I need to use a regex for this, otherwise I need to add things to the framework and bladiblabla. – dutt Oct 25 '10 at 10:04
6

You can check a 4-octet IP address easily without regexes at all. Here's a tested working method:

>>> def valid_ip(ip):
...    parts = ip.split('.')
...    return (
...        len(parts) == 4
...        and all(part.isdigit() for part in parts)
...        and all(0 <= int(part) <= 255 for part in parts)
...        )
...
>>> valid_ip('1.2.3.4')
True
>>> valid_ip('1.2.3.4.5')
False
>>> valid_ip('1.2.   3   .4.5')
False
>>> valid_ip('1.256.3.4.5')
False
>>> valid_ip('1.B.3.4')
False
>>>
John Machin
  • 81,303
  • 11
  • 141
  • 189
3

The following supports IPv4, IPv6 as well as Python 2.7 & 3.3

import socket


def is_valid_ipv4(ip_str):
    """
    Check the validity of an IPv4 address
    """
    try:
        socket.inet_pton(socket.AF_INET, ip_str)
    except AttributeError:
        try:
            socket.inet_aton(ip_str)
        except socket.error:
            return False
        return ip_str.count('.') == 3
    except socket.error:
        return False
    return True


def is_valid_ipv6(ip_str):
    """
    Check the validity of an IPv6 address
    """
    try:
        socket.inet_pton(socket.AF_INET6, ip_str)
    except socket.error:
        return False
    return True


def is_valid_ip(ip_str):
    """
    Check the validity of an IP address
    """
    return is_valid_ipv4(ip_str) or is_valid_ipv6(ip_str)
Val Neekman
  • 17,692
  • 14
  • 63
  • 66
  • this is the best answer of all, but for windows users win_inet_pton must be installed, and the import statement should be changed to: `try: import win_inet_pton except ImportError: pass import socket` – StefanNch Nov 02 '14 at 10:45
3

Regex is for pattern matching, but to check for a valid IP, you need to check for the range (i.e. 0 <= n <= 255).

You may use regex to check for range, but that'll be a bit overkill. I think you're better off checking for basic patter and then check for the range for each number.

For example, use the following pattern to match an IP:

([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})

Then check whether each number is within range.

William Niu
  • 15,798
  • 7
  • 53
  • 93
0

You need this-

^((([1-9])|(0[1-9])|(0[0-9][1-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]))\.){3}(([1-9])|(0[1-9])|(0[0-9][1-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]))$

Regular expression visualization

Debuggex Demo

Rajesh Paul
  • 6,793
  • 6
  • 40
  • 57
0

IP addresses can also be checked with split as follows,

all(map((lambda x: 0<=x<=255),map(int,ip.split('.')))) and len(ip.split("."))==4

For me thats a little bit more readable than regex.

Sujoy
  • 8,041
  • 3
  • 30
  • 36
  • -2 **FAIL** on '1. 2 .3.4' and **FAIL** on '1.B.2.3' @dutt: be suspicious of anything with 2 * map() and excess parentheses and test before use. – John Machin Oct 25 '10 at 05:46
  • The "readable" expression can be reduced to the equivalent `all(map(lambda x: 0<=int(x)<=255,ip.split('.'))) and len(ip.split("."))==4` by removing a map() call and the redundant parentheses around the lambda expression (but still fails, of course) – John Machin Oct 25 '10 at 05:57
  • @John Machin, I accept that I didn't test for the possibility of spaces (my bad). But '1.B.2.3' test doesn't fail. I get a value error. `ValueError: invalid literal for int() with base 10: 'B'` – Sujoy Oct 25 '10 at 15:10
  • @JohnMachin So wrap it in a `try` `except` block and return `False` on `ValueError`. As for `1. 2 .3.4`, map over `ip.split()[0].split('.')`. And if you are worried about readability, wrap it in a function called `is_IP`. – Rúnar Berg Sep 30 '15 at 15:21
0

I think people are taking this too far I suggest you first do this: ips = re.findall('(?:[\d]{1,3}).(?:[\d]{1,3}).(?:[\d]{1,3}).(?:[\d]{1,3})', page) then split the 4 numbers where there is a '.' and check and see if they are smaller than 256

Max
  • 4,152
  • 4
  • 36
  • 52