TL; DR
A one-liner regex solution:
(?:\bis\b)?((?:(?!\bis\b)(?:What\s+is\b)?.?)+)
Regex live demo
JS code:
var re = /(?:\bis\b)?((?:(?!\bis\b)(?:What\s+is\b)?.?)+)/;
console.log(
"What is this and is What is this is cool What is this is"
.split(re)
.filter(Boolean)
);
Whole philosophy
This regex tries to match a is
word as soon as it steps into one otherwise it will continue matching a What is
occurrence or other characters as long as they are not is
. It matches and captures anything other than is
.
The trick here is using a tempered token to see if next occurrence is a is
or not. If not, try to match a What is
or a single character. This process continues up to meeting a is
.
(?:\bis\b)?
Try to match a word is
(
Start of capturing group #1
(?:
Start of non-capturing group
(?!\bis\b)
Look if next word is is
(?:What\s*is\b)?.?
If not try to match What is
or a single character or nothing
)+
Repeat as many as possible
)
End of capturing group #1
split()
method considers matched parts in capturing group to output array, hence matching is
in regex and capturing anything else.
To avoid splitting on different words other than What
you only need to add each word in an alternation within a group:
(?:\bis\b)?((?:(?!\bis\b)(?:(?:What|How|Who)\s+is\b)?.?)+)
You may need to set i
flag in order to capture what is
too if you want or splitting happens.