To speak from the conclusion, I think look-behind is not implemented in JavaScript, since no one has any idea how it should behave, and existing implementations show that adding support for look-behind is rather complex.
JavaScript/ECMAScript is different from other languages in the sense that the specification includes an abstract implementation of the regex engine, while most other language only stops short at description of behavior of each pieces of regex syntax, with scant description of how different tokens interacts with each other.
Look-ahead? Easy to implement
The implementation of look-ahead is quite straight-forward. You only need to treat the pattern inside the look-ahead in the same manner as pattern outside look-ahead, and perform a left-to-right match as per usual, except that after the look-ahead succeeds 1) the current position is restored to before entering the look-ahead, and 2) choice points inside look-ahead are discarded after it is matched.
There is no limit to what can be included inside look-ahead, since it is a very simple extension to the existing natural left-to-right matching facilities.
Look-behind? Not so easy
On the other hand, the implementation of look-behind is not as straight forward.
Imagine how you would implement the following look-behind construct:
(?<=fixed-string)
(?<=a|fixed|string)
(?<=t[abc]{1,3})
(?<=(abc){2,6})
(?<=^.*abc.*)
(?<=\G"[^"]+");
(?<=^(.....|.......)+)
\b(\w+)\b(?<!\b\1\b.*\1)
Apart from the basic case (?<=fixed-string)
, which any look-behind implementation must support, (?<=a|fixed|string)
is a much desirable case to support.
Different regex engine has varied level of support for the regex above.
Let us look at how they are implemented in .NET and Java. (This is the two flavors whose look-behind behavior I have studied.)
.NET implementation
In Microsoft .NET implementation, all those regex above are valid, since .NET implements look-behind by using right-to-left mode, with the starting offset at the current position. The look-behind construct doesn't generate any choice point by itself.
However, if you use capturing groups inside the look-behind, it starts to get confusing, since the atoms in the patterns are interpreted from right-to-left, as demonstrated in this post. This is the disadvantage of this method: you would need to wrap your mind to think right-to-left when writing a look-behind.
Java implementation
In contrast, Java regex implementation implements look-behind by reusing the left-to-right matching facilities.
It first analyzes the pattern inside the look-behind for the minimum and maximum length of the pattern. Then, look-behind is implemented by trying to match the pattern inside from left-to-right, starting from (current position - minimum length)
to (current position - maximum length)
.
Is there anything missing? Yes! Since we are matching from left-to-right, we need to make sure that the match ends right at the position before entering the look-behind (current position
). In Java, this is implemented by appending a node at the end of the pattern inside look-behind.
This implementation is very inefficient, as there are maximum - minimum + 1
choice points created in the look-behind itself, before we even talk about choice points created by the pattern inside the look-behind.
The look-behind bound check is also inefficient, since it is placed at the end of the pattern, and can't prune choice points that are clearly hopeless (those already far exceeding the current position
in the middle of pattern).
Summary
As you can see, adding support for look-behind is not easy:
- The right-to-left approach seems reasonably efficient. However, it requires additional specification on the right-to-left matching behavior on other existing constructs.
- The approach to reuse left-to-right matching facilities is complex to specify, and is very inefficient. It also requires the introduction of pattern analysis into the specification, lest the performance is thrown out of the window.
(Note that I have not yet covered the behavior when look-behind is used inside look-ahead, and vice-versa. This should also be taken into consideration when defining semantics for look-behind construct).
These technical hurdles are also mentioned by Waldemar Horwat (who wrote the ES3 regex spec) in the mail cited in nils' answer:
No one has yet submitted a well-defined proposal for lookbehinds on the table. Lookbehinds are difficult to translate into the language used by the spec and get quite fuzzy when the order of evaluation of parts of the regexp matters, which is what happens if capturing parentheses are involved. Where do you start looking for the lookbehind? Shortest first, longest first, or reverse string match? Greedy or not? Backtrack into capturing results?