1

From another question, I have this expression to match words in a sentence:

var sentence = "Exclamation! Question? Full stop. Ellipsis...";
console.log(sentence.toLowerCase().match(/\w+(?:'\w+)*/g));

It works perfectly. However, now I am looking for a way to match exclamation marks, question marks, and full stops separately. The result should look like this:

[
  "exclamation",
  "!",
  "question",
  "?",
  "full",
  "stop",
  ".",
  "ellipsis",
  "."
]

Only matching one dot from the ellipsis, not all three dots separately.

Any help would be greatly appreciated!

MysteryPancake
  • 1,365
  • 1
  • 18
  • 47

2 Answers2

3

Try Below Code

var sentence = "Exclamation! Question? Full stop. Ellipsis...";
console.log(sentence.toLowerCase().match(/[?!.]|\w+/g));

In case You want only one dot, you could use something like ---

var sentence = "Exclamation!!! Question??? Full stop. Ellipsis...";

var arr = sentence.toLowerCase().match(/[?]+|[!]+|[.]+|\w+/g);
arr = arr.map(function(item){
 return item.replace(/(.)\1+/g, "$1");
})

console.log(arr);
Pawan Singh
  • 824
  • 6
  • 13
2

How about using a word boundary to only return one dot from the ellipsis?

var sentence = "Exclamation! Question? Full stop. Ellipsis...";
console.log(sentence.toLowerCase().match(/[a-z]+(?:'[a-z]+)*|\b[!?.]/g));

Or a negative lookahead:

var sentence = "Exclamation! Question? Full stop. Ellipsis...";
console.log(sentence.toLowerCase().match(/[a-z]+(?:'[a-z]+)*|[!?.](?![!?.])/g));

After your commented scenario extension, a negative lookbehind seems to be effective.

var sentence = "You're \"Pregnant\"??? How'd This Happen?! The vasectomy YOUR 1 job. Let's \"talk this out\"...";
console.log(sentence.toLowerCase().match(/[a-z\d]+(?:'[a-z\d]+)*|(?<![!?.])[!?.]/g));
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • 1
    @MysteryPancake If you have fringe cases that break my pattern please update your question and leave me a comment. – mickmackusa Jul 29 '18 at 05:06
  • 1
    I had completely forgotten about negative lookahead, thanks for reminding. – Pawan Singh Jul 29 '18 at 05:51
  • @mickmackusa Thank you so much, this works perfectly! Just one small request - would it be possible for this to work for numbers as well? Do I just need to replace `[a-z]` with `\w`? – MysteryPancake Jul 29 '18 at 06:00
  • Also, would it be possible for it to match [!?.] after quotes, such as `'text'.`, `"text"?` or `'text'!`. This may be a bit too complicated, sorry – MysteryPancake Jul 29 '18 at 06:09
  • 1
    Perhaps a negative lookbehind will serve your purposes. I added another demo with a new pattern and sample string. – mickmackusa Jul 29 '18 at 12:16
  • @mickmackusa Thank you so much, yet again! Everything appears to work perfectly now. Sorry for a dumb question, but I am just wondering, what is the difference between [a-z\d] and \w? Is there a reason you are using that instead? – MysteryPancake Jul 30 '18 at 13:34
  • 1
    `\w` includes underscores (with numbers and letters). I am sacrificing pattern brevity for pattern accuracy / pattern intent. You can use `\w` if you wish. Actually, because of the conversion to lowercase, I'll drop the `i` pattern modifier at the end. – mickmackusa Jul 30 '18 at 13:38