1

I've tried various formulations of the following expression and have had the most success with the one below. Essentially, I'd like to plug it into .match() and snag all of the words that either begin a sentence, appear in a sentence (whitespace on both sides), or end a sentence. For example, in the sentence, "This question is a bore," I might want "This" "is" and "bore," but not the "or" in the middle of "bore" or the "is" in "this". I'm using "sentence" loosely, as this is being applied to headers, anchor tags, p tags, etc.

I've managed to get whole words only, but I'm not getting all of the words I'd like. For example, "and" gets skipped though "the" gets picked up, despite both being in the middle of a sentence surrounded by whitespace. Any thoughts on refinement?

var exp = /\band|\bthe|\bor|\bwhich|\bon|\babout|\bmovies|\btomatoes|\breddit|\bplayed/gi;
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Ryan
  • 1,312
  • 3
  • 20
  • 40
  • Can you show us a specific example, including which words *don't* get picked up that you think should? – freginold Dec 27 '17 at 18:45
  • I was just trying the prior comment. Let me pull an example. – Ryan Dec 27 '17 at 18:51
  • 3
    Your explanation at the start does not match the sample string with expected results. Perhaps, you want `/\w*(?:and|the|wordshere)\w*/gi` to match words containing the alternatives you have. See [this **JSFiddle**](https://jsfiddle.net/z4q3q3bh/) – Wiktor Stribiżew Dec 27 '17 at 18:52
  • "NEW ON REALMS: SNOWBALLS AND FIREBALLS." --> And is not picked up. The statement should be case insensitive. "MINECON EARTH WATCH THE SHOW HERE!" --> the is picked up here, but "Deck The Halls" --> the is missed here. :shrug: – Ryan Dec 27 '17 at 18:55
  • See [this fiddle](https://jsfiddle.net/z4q3q3bh/1/) with the string above - `AND` and `ON` are found. [Here](https://jsfiddle.net/z4q3q3bh/2/), `the` is found in `"Deck The Halls"`. Please explain the requirements in a clear way. – Wiktor Stribiżew Dec 27 '17 at 18:56
  • Refinements? Yes: Split the string on space, use `filter(String)` to remove empties and then [intersect the arrays](https://stackoverflow.com/questions/16312528/check-if-an-array-contains-any-element-of-another-array-in-javascript) – ctwheels Dec 27 '17 at 18:58
  • @WiktorStribiżew Your response works wonderfully. Thank you! – Ryan Dec 27 '17 at 19:02

1 Answers1

2

The requirement "that may or may not be surrounded by whitespace" means that you do not even need to check for whitespace (it is irrelevant). What you are after is matching words that contain any of the alternatives on your list.

Use the pattern like

\w*(?:and|the|or|which|on|about|movies|tomatoes|reddit|played)\w*

See the regex demo

Here, \w* on both ends of the non-capturing group matches 0+ word chars (ASCII letters, digits or _ char).

JS demo:

var exp = /\w*(?:and|the|or|which|on|about|movies|tomatoes|reddit|played)\w*/gi;
var s = "This question is a bore,";
console.log(s.match(exp));
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563