2

A personal project requiring me to create regular expressions for IP addresses led me to the following standoff.

pattern = r'123\.145\.167\.[0-9]{1,2}'
source = "123.145.167.0, 123.145.167.99, 123.145.167.100"
n = re.search(pattern, source)
print n.group()


pattern = r'123\.145\.167\.[0-9]{1,2}'
source = "123.145.167.0, 123.145.167.99, 123.145.167.100"
n = re.compile(pattern)
print n.findall(source)

While using search matches only the first element in the source string, findall creates a problem by giving an output such as this

['123.145.167.0', '123.145.167.99', '123.145.167.10']

Is it possible that I can obtain the matches for both 123.145.167.0 and 123.145.167.99 and not the 123.145.167.100 ?

I have already gone thorough python - regex search and findall and yet not able to understand how I can solve my problem.

Community
  • 1
  • 1
surya
  • 253
  • 3
  • 9
  • You may want to understand what word boundaries are. Moreover, there is a basic difference between `re.search()` and `re.findall()`. – devnull May 26 '14 at 06:03

3 Answers3

1

Throw a word boundary on the end: \b.

pattern = r'123\.145\.167\.[0-9]{1,2}\b'
source = "123.145.167.0, 123.145.167.99, 123.145.167.100"
n = re.compile(pattern)
print n.findall(source)

Gives:

['123.145.167.0', '123.145.167.99']
1

You can use a lookahead assertion:

pattern = r'123\.145\.167\.[0-9]{1,2}(?=[^0-9]|$)'

the part

(?=[^0-9]|$)

means that you just want to check if following there is either a non-numeric character or the string ends. This check will not "use" any char and will only influence if the expression matches or not. With this approach findall will provide the result you're looking for.

From the documentation:

(?=...) Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

6502
  • 112,025
  • 15
  • 165
  • 265
0

You would need to define a boundry for your match. 123.145.167.10 is within 123.145.167.100. You can use the \b tag to define a boundry.

r"\b123\.145\.167\.[0-9]{1,2}\b"
olyv
  • 3,699
  • 5
  • 37
  • 67
DevKeh
  • 1
  • 1