I've tried various formulations of the following expression and have had the most success with the one below. Essentially, I'd like to plug it into .match() and snag all of the words that either begin a sentence, appear in a sentence (whitespace on both sides), or end a sentence. For example, in the sentence, "This question is a bore," I might want "This" "is" and "bore," but not the "or" in the middle of "bore" or the "is" in "this". I'm using "sentence" loosely, as this is being applied to headers, anchor tags, p tags, etc.
I've managed to get whole words only, but I'm not getting all of the words I'd like. For example, "and" gets skipped though "the" gets picked up, despite both being in the middle of a sentence surrounded by whitespace. Any thoughts on refinement?
var exp = /\band|\bthe|\bor|\bwhich|\bon|\babout|\bmovies|\btomatoes|\breddit|\bplayed/gi;