0

I'm new to python regex and am learning the lookahead assertion.

I found the following strange. Could someone tell me how it works?

import regex as re
re.search('(\d*)(?<=a)(\.)','1a.')
<regex.Match object; span=(2, 3), match='.'>

re.search('(\d+)(?<=a)(\.)','1a.')
out put nothing

Why doesn't the second one match anything?

QHarr
  • 83,427
  • 12
  • 54
  • 101
user8641707
  • 199
  • 2
  • 9

1 Answers1

1

The first pattern:

re.search('(\d*)(?<=a)(\.)', '1a.')

says to find zero or more digits, followed by a dot. Right before the dot, it has a positive lookbehind, which asserts the previous character was an a. In this case, Python will match zero digits, followed by a single dot. The lookbehind fires true, because the preceding character was in fact an a.

However, the second pattern:

re.search('(\d+)(?<=a)(\.)','1a.')

matches one or more digits, followed the lookbehind and matching dot. In this case, Python is compelled to match the number 1. But then it the lookbehind must fail. Obviously, if the last character matched were a number, it cannot be the letter a. So, there is no match possible in the second case. Even if we were to remove (?<=a) from the second pattern, it would still fail because we are not accounting for the letter a.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • i thought the matching principal was to test those patterns one by one. e.g. `1` matches `(\d+)`,`a` matches `(?<=a)` which is righter before the dot `.` as matched `(\.)`. so i thought the expression should return something: `.` or `1 .`(`(?<=a)` would not return values). – user8641707 Jan 01 '18 at 04:01
  • Now i think the reason may be that the expression cannot return `.` because it wants to return a `1` because of `(\d+)` and it wants to return `.` at the same time because there is a `a` right before `.`, so the expression attemps `1 .` but it cannot do it because the `1` and `.` is not linked togther. so it returns nothing at all. – user8641707 Jan 01 '18 at 04:01
  • It matches a single number `1`, but then hits a lookbehind before the dot, insisting that an `a` came before it. Obviously this fails, and to make matters worse there is an actual letter `a` there which is not matched to anything. – Tim Biegeleisen Jan 01 '18 at 04:03
  • `re.search('(\d*)(?<=a)(\.)', '1a.')` this expression success because it can accept returned value as `.` because `(\d*)` is treated as zero digit. – user8641707 Jan 01 '18 at 04:06