Regex in python repetition Error

Question

In my code I Want answer [('22', '254', '15', '36')] but got [('15', '36')]. My regex (?:([0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)\.){3} is not run for 3 time may be!

import re
def fun(st):
    print(re.findall("(?:([0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)\.){3}([0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)",st))

ip="22.254.15.36"
print(fun(ip))

Python regex (and most regex engines) only returns last match for a group. Why don't you just split on `.` and then do whatever you need. — ctwheels, Jan 10 '18 at 17:46
Possible duplicate of [Regular Expressions in Python for dissecting IP](https://stackoverflow.com/questions/11593022/regular-expressions-in-python-for-dissecting-ip) — ctwheels, Jan 10 '18 at 17:49

Martijn Pieters · Answer 1 · 2018-01-10T22:44:21.680

You only have two capturing groups in your regex:

(?:    # non-capturing group
    (  # group 1
        [0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?
    )\.
){3}  
(      # group 2
        [0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?
)

That the first group can be repeated 3 times doesn't make it capture 3 times. The regex engine will only ever return 2 groups, and the last match in a given group will fill that group.

If you want to capture each of the parts of an IP address into separate groups, you'll have to explicitly define groups for each:

pattern = (
    r'([0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)\.'
    r'([0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)\.'
    r'([0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)\.'
    r'([0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)')

def fun(st, p=re.compile(pattern)):
    return p.findall(st)

You could avoid that much repetition with a little string and list manipulation:

octet = r'([0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)'
pattern = r'\.'.join([octet] * 4)

Next, the pattern will just as happily match the 25 portion of 255. Better to put matching of the 200-255 range at the start over matching smaller numbers:

octet = r'(2(?:5[0-5]|[0-4]\d)|[01]?[0-9]{1,2})'
pattern = r'\.'.join([octet] * 4)

This still allows leading 0 digits, by the way, but is

If all you are doing is passing in single IP addresses, then re.findall() is overkill, just use p.match() (matching only at the string start) or p.search(), and return the .groups() result if there is a match;)

def fun(st, p=re.compile(pattern + '$')):
    match = p.match(st)
    return match and match.groups()

Note that no validation is done on the surrounding data, so if you are trying to extract IP addresses from a larger body of text you can't use re.match(), and can't add the $ anchor and the match could be from a larger number of octets (e.g. 22.22.22.22.22.22). You'd have to add some look-around operators for that:

# only match an IP address if there is no indication that it is part of a larger
# set of octets; no leading or trailing dot or digits
pattern = r'(?<![\.\d])' + pattern + r'(?![\.\d])'

`(?<![.\d])` is nicer than `(?<!\.|\d)` -> same for lookahead and it's quicker (you don't need to escape the `.` in the set) — ctwheels, Jan 10 '18 at 18:15
@ctwheels: thanks, that is indeed a better option. How I wish the Python regex compiler would optimise sometimes. :-) — Martijn Pieters, Jan 10 '18 at 18:16
Sorry, was just editing my own and hadn't had a chance to explain: `001.199.249.255` matches `25` instead of `255` — ctwheels, Jan 10 '18 at 18:40
@ctwheels: ah, yes, of course. I've adjusted the pattern to account for that. — Martijn Pieters, Jan 10 '18 at 22:44
you can use the pattern in my answer if you want. I optimized it for better performance, but that’s loads better — ctwheels, Jan 10 '18 at 22:49

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

Overview

As I mentioned in the comments below your question, most regex engines only capture the last match. So when you do (...){3}, only the last match is captured: E.g. (.){3} used against abc will only return c.

Also, note that changing your regex to (2[0-4]\d|25[0-5]|[01]?\d{1,2}) performs much better and catches full numbers (currently you'll grab 25 instead of 255 on the last octet for example - unless you anchor it to the end).

To give you a fully functional regex for capturing each octet of the IP:

(2[0-4]\d|25[0-5]|[01]?\d{1,2})\.(2[0-4]\d|25[0-5]|[01]?\d{1,2})\.(2[0-4]\d|25[0-5]|[01]?\d{1,2})\.(2[0-4]\d|25[0-5]|[01]?\d{1,2})

Personally, however, I'd separate the logic from the validation. The code below first validates the format of the string and then checks whether or not the logic (no octets greater than 255) passes while splitting the string on ..

Code

See code in use here

import re
ip='22.254.15.36'
if re.match(r"(?:\d{1,3}\.){3}\d{1,3}$", ip):
    print([octet for octet in ip.split('.') if int(octet) < 256])

Result: ['22', '254', '15', '36']

If you're using this method to extract IPs from an arbitrary string, you can replace re.match() with re.search() or re.findall(). In that case you may want to remove $ and add some logic to ensure you're not matching special cases like 11.11.11.11.11: (?<!\d\.)\b(?:\d{1,3}\.){3}\d{1,3}\b(?!\.\d)

score 0 · Answer 3 · edited Jun 20 '20 at 09:12

I encountered a very similar issue. I found two solutions, using the official documentation. The answer of @ctwheels above did mention the cause of the problem, and I really appreciate it, but it did not provide a solution. Even when trying the lookbehind and the lookahead, it did not work.

First solution: re.finditer

re.finditer iterates over match objects !!

You can use each one's 'group' method !

    >>> def fun(st):
        pr=re.finditer("(?:([0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)\.){3}([0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)",st)
        for p in pr:
            print(p.group(),end="")

    >>> fun(ip)
    22.254.15.36

Or !!!

Another solution haha : You can still use findall, but you'll have to make every group a non-capturing group ! (Since the main problem is not with findall, but with the group function that is used by findall (which, we all know, only returns the last match):

"re.findall:

...If one or more groups are present in the pattern, return a list of groups"

(Python 3.8 Manuals)

So:

    >>> def fun(st):
        print(re.findall("(?:(?:[0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)\.){3}(?:[0-1]?[0-9]{0,2}|2?[0-4]?[0-9]|25[0-5]?)",st))

    >>> fun(ip)
    ['22.254.15.36']

Have fun !

Regex in python repetition Error

3 Answers3

Overview

Code