This significantly-improved answer owes itself to @EliGassert.
String.prototype.match_overlap = function(re)
{
if (!re.global)
re = new RegExp(re.source,
'g' + (re.ignoreCase ? 'i' : '')
+ (re.multiline ? 'm' : ''));
var matches = [];
var result;
while (result = re.exec(this))
matches.push(result),
re.lastIndex = result.index + 1;
return matches.length ? matches : null;
}
@EliGassert points out that there is no need to walk through the entire string character by character; instead we can find a match anywhere (i.e. do without the anchor), and then continue one character after the index of the found match. While researching how to retrieve said index, I found that the re.lastIndex
property, used by exec
to keep track of where it should continue its search, is in fact settable! This works rather nicely with what we intend to do.
The only bit needing further explanation might be the beginning. In the absence of the g
flag, exec
may never return null
(always returning its one match, if it exists), thus possibly going into an infinite loop. Since, however, match_overlap
by design seeks multiple matches, we can safely recompile any non-global RegExp
as a global RegExp
, importing the i
and m
options as well if set.
Here is a new jsFiddle: http://jsfiddle.net/acheong87/h5MR5/.
document.write("<pre>");
document.write('sasas'.match_overlap(/sas/));
document.write("\n");
document.write('aaaa'.match_overlap(/aa/));
document.write("\n");
document.write('my1name2is3pilchard'.match_overlap(/[a-z]{2}[0-9][a-z]{2}/));
document.write("</pre>");
Output:
sas,sas
aa,aa,aa
my1na,me2is,is3pi