1

The current REGEX I'm using is the following one:

var sentences = fulltext.match(/[^\.!\?]+[\.!\?]+/g);

That returns an array with the sentences split INCLUDING the spaces (I need all the characters). Problem is, it does not work with ellipsis "..." and I guess neither it does with other unconventional forms of punctuation.

How can I fix my REGEX to match this and other forms of punctuation?

Is there any noob friendly example driven guide to REGEX out there?

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Belohlavek
  • 167
  • 3
  • 13
  • 2
    Ellipsis also have their own character / code point -- [U+2026](https://en.wikipedia.org/wiki/Ellipsis#Computer_representations) or `\u2026` -- that are distinct from 3 consecutive `.`s (U+002E). – Jonathan Lonowski Jan 25 '14 at 22:58
  • possible duplicate of [Javascript regular expression for punctuation (international)?](http://stackoverflow.com/questions/7576945/javascript-regular-expression-for-punctuation-international) – Jonathan Lonowski Jan 25 '14 at 23:06

2 Answers2

4

Unicode of ellipsis is \u2026.

So you can use \u2026 to match an ellipsis .

Code :

var fulltext= "First sentence… Second sentence. ";
fulltext.match(/([^.?!;\u2026]+[.?!;\u2026]+)/g);

OUTPUT

["First sentence…", " Second sentence."]

DEMO and Explanation

Sujith PS
  • 4,776
  • 3
  • 34
  • 61
3

You can just add the ellipsis (and any other punctuation characters) to your character sets.

var input = "First sentence… Second sentence. ";
input.match(/[^\.\?!;…]+[\.\?!;…]+/g);

Result:

["First sentence…", " Second sentence."]
zord
  • 4,538
  • 2
  • 25
  • 30