I'm having a little difficulty with a regex for javascript;
Heres my fiddle: http://jsfiddle.net/6yhwzap0/
The function I have created is:
var splitSentences = function(text) {
var messy = text.match(/\(?[^\.\?\!]+[\.!\?]\)?/g);
var clean = [];
for(var i = 0; i < messy.length; i++) {
var s = messy[i];
var sTrimmed = s.trim();
if(sTrimmed.length > 0) {
if(sTrimmed.indexOf(' ') >= 0) {
clean.push(sTrimmed);
} else {
var d = clean[clean.length - 1];
d = d + s;
var e = messy[i + 1];
if(e.trim().indexOf(' ') >= 0) {
d = d + e;
i++;
}
clean[clean.length - 1] = d;
}
}
}
return clean;
};
I get really good results with text.match(/\(?[^\.\?\!]+[\.!\?]\)?/g);
my big issue is that if a string has a quote after the period it is added to the next sentence.
So for example the following:
"Hello friend. My name is Mud." Said Mud.
Should be split into the following array:
['"Hello friend.', 'My name is Mud."', 'Said Mud.']
But instead it is the following:
['"Hello friend.', 'My name is Mud.', '" Said Mud.']
(See the quote in the 'Said Mud' string)
Can anyone help me with this OR point me to a good JavaScript library that can split text into Paragraphs, Sentences and Words? I found blast.js
but I am using Angular.js and it did not integrate well at all.