Abusing String.replace function
I use a little trick using the replace
function. Since the replace
function loops through the matches and allows us to specify a function, the possibility is infinite. The result will be in output
.
var output = [];
var str = "Mary had a little lamb";
str.replace(/[A-Za-z]+(?=(\s[A-Za-z]+))/g, function ($0, $1) {
output.push($0 + $1);
return $0; // Actually we don't care. You don't even need to return
});
Since the output contains overlapping portion in the input string, it is necessary to not to consume the next word when we are matching the current word by using look-ahead 1.
The regex /[A-Za-z]+(?=(\s[A-Za-z]+))/g
does exactly as what I have said above: it will only consume one word at a time with the [A-Za-z]+
portion (the start of the regex), and look-ahead for the next word (?=(\s[A-Za-z]+))
2, and also capture the matched text.
The function passed to the replace
function will receive the matched string as the first argument and the captured text in subsequent arguments. (There are more - check the documentation - I don't need them here). Since the look-ahead is zero-width (the input is not consumed), the whole match is also conveniently the first word. The capture text in the look-ahead will go into the 2nd argument.
Proper solution with RegExp.exec
Note that String.replace
function incurs a replacement overhead, since the replacement result is not used at all. If this is unacceptable, you can rewrite the above code with RegExp.exec
function in a loop:
var output = [];
var str = "Mary had a little lamb";
var re = /[A-Za-z]+(?=(\s[A-Za-z]+))/g;
var arr;
while ((arr = re.exec(str)) != null) {
output.push(arr[0] + arr[1]);
}
Footnote
In other flavor of regex which supports variable width negative look-behind, it is possible to retrieve the previous word, but JavaScript regex doesn't support negative look-behind!.
(?=pattern)
is syntax for look-ahead.
Appendix
String.match
can't be used here since it ignores the capturing group when g
flag is used. The capturing group is necessary in the regex, as we need look-around to avoid consuming input and match overlapping text.