-1

I was trying to match IPv4 addresses using regex. I got following regex.

But I am not able to understand ?: in it.

## r'(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'

>>> import re
>>> re.findall(r'(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)', txt)
['254.123.11.13', '254.123.11.14', '254.123.12.13', '254.123.12.14', '254.124.11.13', '254.124.11.14', '254.124.12.13']

I know ?: is for avoiding capturing of a group, but here I am not able to make a sense with it.

Update: If I am removing ?:, I am getting following result. I thought I will get IP address along with captured groups in tuples.

>>> re.findall(r'((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)', txt)
[('11.', '11', '13'), ('11.', '11', '14'), ('12.', '12', '13'), ('12.', '12', '14'), ('11.', '11', '13'), ('11.', '11', '14'), ('12.', '12', '13')]

2 Answers2

1

As i said in comment if you don't use non-capture group instead of matching the whole of your regex and due to this note that you have 3 group in your regex you'll get 3 result for each IP.

For better demonstration see the following sate machine :

without non-capture group :

((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

Regular expression visualization

Debuggex Demo

Using non-capture group :

(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

Regular expression visualization

Debuggex Demo

As you can see when you sue non-capturing group you have not any group and the whole of your regex will interpret as one group usually the group 0!

Mazdak
  • 105,000
  • 18
  • 159
  • 188
1

The non-capture group is needed in this case because the {3} repeat specifier for your IPv4 quartet returns only the third match. The outer group however will provide all 3 of the matching inner matches: ( q{3} ) where q=regex for a number in your quartet. However we want to hide the third match with non-capture specifier for the inner group.

See below for a regex without the non-capturing, problem and a solution.

q = r'(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'

Reproducing the {3} repeat problem without non-capturing:

t = '(%s\.){3}%s' % (q,q)
>>> re.findall(t,txt)
[('11.', '11', '13'), ('11.', '11', '14')]

Solution if you wanted tuples captured separately:

s='{0}\.{0}\.{0}\.{0}'.format(q)
>>> re.findall(s, txt)
[('254', '123', '11', '13'), ('254', '123', '11', '14')]

or

s='({0}\.{0}\.{0}\.{0})'.format(q)
>>> re.findall(s,txt)
[('254.123.11.13', '254', '123', '11', '13'), ('254.123.11.14', '254', '123', '11', '14')]
jack
  • 26
  • 3