Why does negative lookback work this way in regex?

Question

The regex ((?!hede).)*$ matches sasha hede, it matched the part ede, which makes sense to me. But the regex ^((?!hede).)* only matches sasha<space>, I expected it to match sasha hed. What am I missing ?

Related: http://stackoverflow.com/questions/30900794/tempered-greedy-token-what-is-different-about-placing-the-dot-before-the-negat — raina77ow, Mar 16 '17 at 16:03
There are some regex engine optimizations when using anchors `^ or $`. The overall result is that given `$`, the position _starts_ there and decrements as far as possible to match. Same with `^` except it starts there, then increments its position as far as possible. That's the basic reason for the difference. And fwiw, the construct here is a _negative look ahead_ (not lookback). — , Mar 16 '17 at 16:18

raina77ow · Accepted Answer · 2017-03-16T16:00:47.647

This part...

((?!hede).)*

... is read as 'match any number of symbols, each of those is not a start of hede sequence'. In other words, you set up a rule that should be matched by each character in the matched substring.

In sasha hede, only s, a, s, h, a, and (whitespace) characters match the description. However, h symbol doesn't (it starts a hede sequence), so matching has to be stopped here.

BTW, it's the same with the first pattern (bound to the end of string): matching is stopped at the very first symbol excluded by the pattern. If it were not the case, the whole string would have been matched, not just ede.

Why does negative lookback work this way in regex?

1 Answers1