-1

Below is the latest version of the regular expression I am using and it is throwing the error "Invalid Regular Expression."

Any foo with the formatting of the regular expression would be much appreciated!

Below is my code:

// This function gets all the text in browser
function getText() {
    return document.body.innerText;
}
var allText = getText(); // stores into browser text into variable

//regex set to rid text of all punctuaction, symbols, numbers, and excess  spaces
var matcher = new RegExp ("/(?<!\w)[a-zA-Z]+(?!\w)/", "g");

//cleanses text in browser of punctuation, symbols, numbers, and excess spaces
var newWords = allText.match(matcher);

//using a single space as the dividing tool, creates a list of all words
var Words=newWords.split(" ");
Lance
  • 123
  • 2
  • 14
  • 1
    Use regex literal syntax or When using RegEx constructor **1.** you don't need the delimiter slashes **2.** backslashes should be double escaped. **Use** `new RegExp("(?<!\\w)[a-zA-Z]+(?!\\w)", "g");` – Tushar Jan 26 '16 at 11:57
  • The regex is not doing what you think it is... – Wiktor Stribiżew Jan 26 '16 at 12:00
  • @Tushar just copied and pasted your recommendation and am still receiving the same error – Lance Jan 26 '16 at 12:03
  • 1
    JS doesn't support lookbehinds like `(?<!\w)`, try `\b` instead. – georg Jan 26 '16 at 12:03
  • 1
    This is not a dupe of [that question](http://stackoverflow.com/questions/34705240/convert-string-into-regular-expression-in-javascript), it just has a lot of issues. – Wiktor Stribiżew Jan 26 '16 at 12:04

1 Answers1

3

Instead of

//regex set to rid text of all punctuaction, symbols, numbers, and excess  spaces
var matcher = new RegExp ("/(?<!\w)[a-zA-Z]+(?!\w)/", "g");
//cleanses text in browser of punctuation, symbols, numbers, and excess spaces
var newWords = allText.match(matcher);
//using a single space as the dividing tool, creates a list of all words
var Words=newWords.split(" ");

Just use

var Words = allText.match(/\b[a-zA-Z]+\b/g); // OR...
// var Words = allText.match(/\b[A-Z]+\b/ig);

This will get you all the "words" just consisting of ASCII letters as String#match together with a /g-based regex will fetch all substrings matching the regex (that matches 1 or more ASCII letters between word boundaries).

JS does not support lookbehind (i.e. (?<!) or (?<=) constructs), you need a word boundary \b here.

Note that you'd need something like .replace(/\W+/g, ' ') to rid text of all punctuaction, symbols, numbers, and excess spaces, but it seems you just can rely on .match(/\b[a-zA-Z]\b/g).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563