The backtrack is a lie.
It's just a consequence of how the regex101 debugger is implemented. It uses a PCRE feature (flag) called PCRE_AUTO_CALLOUT
. This flag tells the PCRE engine to invoke a user-defined function at every step of matching. This function receives the current match status as input.
The catch is that PCRE doesn't tell the callout when it really backtracks. Regex101 has to infer that from the match status.
As you can see, in the step before the "backtrack" occurs, the current matched text is a_
, and just after you get out of the lookahead, it's reverted to a
. Regex101 notices the matched text is shorter and therefore it infers that a backtrack must have happened, with the confusing outcome you noticed.
For reference, here's the internal PCRE representation of the pattern with auto-callout enabled:
$ pcretest
PCRE version 8.38 2015-11-23
re> /a(?=_)_b/DC
------------------------------------------------------------------
0 59 Bra
3 Callout 255 0 1
9 a
11 Callout 255 1 5
17 17 Assert
20 Callout 255 4 1
26 _
28 Callout 255 5 0
34 17 Ket
37 Callout 255 6 1
43 _
45 Callout 255 7 1
51 b
53 Callout 255 8 0
59 59 Ket
62 End
------------------------------------------------------------------
Capturing subpattern count = 0
Options:
First char = 'a'
Need char = 'b'
As you can see, there's no branching opcode there, just an Assert
.