0

I would like to split a string of text into an array of sentences without loosing the punctuation mark.

var string = 'This is the first sentence. This is another sentence! This is a question?' 
var splitString = string.split(/[!?.] /);
splitString  
  => ["This is the first sentence", "This is another sentence", "This is a question?"]

Only the last punctuation mark(?) is kept. What is the best way to split after the punctuation marks on all the sentences so that splitString returns the following instead?

["This is the first sentence.", "This is another sentence!", "This is a question?"]
gyre
  • 16,369
  • 3
  • 37
  • 47
Nate Lipp
  • 700
  • 1
  • 7
  • 9
  • 1
    Duplicate of http://stackoverflow.com/questions/12001953/javascript-and-regex-split-string-and-keep-the-separator – pumbo Mar 28 '17 at 05:42
  • 1
    Also: http://stackoverflow.com/questions/4514144/js-string-split-without-removing-the-delimiters – pumbo Mar 28 '17 at 05:43
  • Anyways, it's easily googleable by "`javascript string split keep delimiter`" – pumbo Mar 28 '17 at 05:44

2 Answers2

3

Instead of using split to target where you want to break your text, you can use String#match with a global regular expression and target the text you want to keep:

var splitString = string.match(/\S.+?[!?.]/g)

This avoids the need to use look-behinds, which are unsupported in JavaScript's regex flavor as of now, or additional calls to methods like Array#filter:

var string = 'This is the first sentence. This is another sentence! Is this a question?'

var splitString = string.match(/\S.+?[!?.]/g)

console.log(splitString)
gyre
  • 16,369
  • 3
  • 37
  • 47
1

Few approaches:

The solution using String.prototype.match() function to get an array of sentences:

var string = 'This is the first sentence. This is another sentence! This is a question?',
    items = string.match(/\S[^.!?]+[.?!]/g);

console.log(items);

The alternative solution using String.prototype.split() function would look like below:

var string = 'This is the first sentence. This is another sentence! This is a question?',
    items = string.split(/(\S[^.!?]+[.?!])/g).filter(function(s){ return s.trim(); });

console.log(items);

\S[^.!?]+ - will match all characters except specified punctuation chars [^.!?] and starting with non-whitespace character \S

RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105