1

Using JavaScript I'm trying to split a paragraph into it's sentences using regular expressions. My regular expression doesn't account for a sentence being inside brackets and I would like to keep the delimiter.

I've put an example of the code in jsFiddle.net here

pb2q
  • 58,613
  • 19
  • 146
  • 147
Mike Mengell
  • 2,310
  • 2
  • 21
  • 35
  • Does this answer your question? [Javascript and regex: split string and keep the separator](https://stackoverflow.com/questions/12001953/javascript-and-regex-split-string-and-keep-the-separator) – Liam May 12 '23 at 11:35

3 Answers3

4

I took the match approach rather than split. It could be tighter (e.g. what if a sentence ends with ..., etc).

text.match(/\(?[A-Z][^\.]+[\.!\?]\)?(\s+|$)/g);

http://jsfiddle.net/DepKF/1/

Mitya
  • 33,629
  • 9
  • 60
  • 107
  • 1
    You pretty much have to use `match`, since JS doesn't have a delimiter-capture option for `split`. – chaos Jun 27 '12 at 16:17
  • Yes you can use split. But you need a look-ahead (that an no capturing group) `text.split(/\b(?![\?\.\!])/);` – bavo Dec 06 '15 at 23:40
1

@Utkanos You idea is good, but I think replace may better:

text.replace(/\(?[A-Z][^\.]+[\.!\?]\)?/g, function (sentence) {
    output += '<p>'+ sentence + '</p>';
});

http://jsfiddle.net/juGT7/1/

You no need to loop again.

wiky
  • 6,178
  • 3
  • 16
  • 10
  • I hadn't thought if going down this route. Really good thanks but I need the loop for other things later on. – Mike Mengell Jun 28 '12 at 08:02
  • @wiky - I did give this method some thought, but ultimately there's no computational saving because there is a loop nonetheless - in this case, the loop is the iterative callback. – Mitya Jun 28 '12 at 08:36
1

use the (?=pattern) lookahead pattern in the regex example

var string = '500x500-11*90~1+1';
string = string.replace(/(?=[$-/:-?{-~!"^_`\[\]])/gi, ",");
string = string.split(",");

this will give you the following result.

[ '500x500', '-11', '*90', '~1', '+1' ]

Can also be directly split

string = string.split(/(?=[$-/:-?{-~!"^_`\[\]])/gi);

giving the same result

[ '500x500', '-11', '*90', '~1', '+1' ]
Fry
  • 325
  • 3
  • 7