0

I was testing the regex (?=\d)\w(?=\d) on the string 123abc456. My expectation was that the positive lookahead (?=\d) should match 1 and discard it ( since lookaheads are zero length assertions) and then \w should match2 and the second positive look ahead should just match3 and discard it. Thus, we have an overall match of just 2.

However, pythex yields the match as 12 and 45 . Where could I be going wrong? Thanks!

MathMan
  • 191
  • 1
  • 11
  • Precisely *because* the lookahead is zero-length, the `1` is not consumed by `(?=\d)`, but only matched. – chepner Jan 05 '23 at 15:31
  • @chepner so `1` shouldn't be included in the overall match? – MathMan Jan 05 '23 at 15:34
  • Why do you think first `1` should not be matched when it satisfies all the assertions? – anubhava Jan 05 '23 at 15:34
  • 1
    `(?=\d)\w(?=\d)` = `\d(?=\d)` [matches](https://regex101.com/r/Uv49cN/1) a digit followed with any other digit. – Wiktor Stribiżew Jan 05 '23 at 15:35
  • What are you testing with? `re.findall("(?=\d)\w(?=\d)", "123abc456")` find 4 matches, `1`, `2`, `4`, and `5`, being the 4 characters that both are and are followed by digits. – chepner Jan 05 '23 at 15:35
  • 1
    `(?=\d)` matches but does not consume 1, then `\w` both matches *and* consumes 1, and `(?=\d)` matches but does not consume `2`. That leaves `2` as the first character in the string to which `re.findall` applies the regular expression again. – chepner Jan 05 '23 at 15:37
  • 1
    *"`(?=\d)` should match `1` and discard it ( since lookaheads are zero length assertions) and then `\w` should match `2`"* - No, exactly because it's non-consuming, the `\w` will match `1`... – Tomerikoo Jan 05 '23 at 15:37
  • 1
    The match result at pythex.org is ambiguous, as the results for your regex and `(?=\d)\w\w(?=\d)` look identical. – chepner Jan 05 '23 at 15:41
  • 2
    The linked duplicate doesn't explain how a zero-length lookahead works, which is the crux of the OP's question. – chepner Jan 05 '23 at 15:51

0 Answers0