53

What's a good strategy to get full words into an array with its succeeding character.

Example.

This is an amazing sentence.

Array(
[0] => This 
[1] => is
[2] => an
[3] => amazing
[4] => sentence.
)

Elements 0 - 3 would have a succeeding space, as a period succeeds the 4th element.

I need you to split these by spacing character, Then once width of element with injected array elements reaches X, Break into a new line.

Please, gawd don't give tons of code. I prefer to write my own just tell me how you would do it.

Penny Liu
  • 15,447
  • 5
  • 79
  • 98
THE AMAZING
  • 1,496
  • 2
  • 16
  • 38
  • I'd take the approach provided in this answer http://stackoverflow.com/questions/4514144/js-string-split-without-removing-the-delimiters. But for your case change the `var newstringreplaced = string.replace(/d/gi, ",d");` to `var newstringreplaced = string.replace(/\s/gi, " ,");`. **Edit:** Should be advised this approach is only useful if your original string doesn't have a `,`. I suppose this solution is much safer: http://stackoverflow.com/a/4514241/1417588 – Suvi Vignarajah Aug 27 '13 at 19:12
  • Use the javascript [split](http://www.w3schools.com/jsref/jsref_split.asp) function. – saurabh Aug 27 '13 at 19:00

9 Answers9

78

Similar to Ravi's answer, use match, but use the word boundary \b in the regex to split on word boundaries:

'This is  a test.  This is only a test.'.match(/\b(\w+)\b/g)

yields

["This", "is", "a", "test", "This", "is", "only", "a", "test"]

or

'This is  a test.  This is only a test.'.match(/\b(\w+\W+)/g)

yields

["This ", "is  ", "a ", "test.  ", "This ", "is ", "only ", "a ", "test."]
Isaac
  • 10,668
  • 5
  • 59
  • 68
  • 7
    This is indeed the best answer, as split by space is not really usable for real-life scenarios. Well, unless you don't use punctuation and always use single whitespace. – Alex.Me Sep 27 '17 at 09:37
  • 7
    That converts "won't" into "won" and "t". This allows contractions: str.match(/\b(\w+)'?(\w+)?\b/g) – Thomas David Kehoe Feb 17 '18 at 23:03
  • english words only :( – iiic Jun 17 '20 at 12:36
  • 1
    \b does not work with non ASCII chars. For example 'é'.match(/\b(\w+)\b/g) returns null – Clement Feb 20 '23 at 19:49
  • @Clement Something like `/(\p{L}+\P{L}+)/` might work with ES2018 (or PCRE), but it's unclear to me whether there's any real support for that. Ref: https://stackoverflow.com/a/48902765/291280 – Isaac Mar 14 '23 at 19:14
68

Just use split:

var str = "This is an amazing sentence.";
var words = str.split(" ");
console.log(words);
//["This", "is", "an", "amazing", "sentence."]

and if you need it with a space, why don't you just do that? (use a loop afterwards)

var str = "This is an amazing sentence.";
var words = str.split(" ");
for (var i = 0; i < words.length - 1; i++) {
    words[i] += " ";
}
console.log(words);
//["This ", "is ", "an ", "amazing ", "sentence."]

Oh, and sleep well!

h2ooooooo
  • 39,111
  • 8
  • 68
  • 102
20

try this

var words = str.replace(/([ .,;]+)/g,'$1§sep§').split('§sep§');

This will

  1. insert a marker §sep§ after every chosen delimiter [ .,;]+
  2. split the string at the marked positions, thereby preserving the actual delimiters.
Carsten Massmann
  • 26,510
  • 2
  • 22
  • 43
8

If you need spaces and the dots the easiest would be.

"This is an amazing sentence.".match(/.*?[\.\s]+?/g);

the result would be

['This ','is ','an ','amazing ','sentence.']
Ravi Rajendra
  • 688
  • 4
  • 11
7

Use split and filter to remove leading and trailing whitespaces.

let str = '     This is an amazing sentence.  ',
  words = str.split(' ').filter(w => w !== '');

console.log(words);
Penny Liu
  • 15,447
  • 5
  • 79
  • 98
3

Here is an option if you wanted to include the space and complete in O(N)

var str = "This is an amazing sentence.";
var words = [];
var buf = "";
for(var i = 0; i < str.length; i++) {
    buf += str[i];
    if(str[i] == " ") {
        words.push(buf);
        buf = "";
    }
}

if(buf.length > 0) {
    words.push(buf);
}
doogle
  • 3,376
  • 18
  • 23
3

The following solution splits words, not only by space, but also other types of spaces and punctuation characters. In addition, it works with non ASCII characters.

It matches words by considering only characters that belong to certain categories of characters. It allows letters (L), numbers (N), symbols (S) and marks (M) so it matches quite a broad set but you can adjust if you need a different set of characters. Other categories such as punctuations (P) and separators (Z) are not included and will therefore not match.

input.match(/[\p{L}\p{N}\p{S}\p{M}]+/gu)

Example

' \t a 件数  ,;-asd'.match(/[\p{L}\p{N}\p{S}\p{M}]+/gu)

Returns ['a', '件数', '', 'asd']

Clement
  • 3,990
  • 4
  • 43
  • 44
1

This can be done with lodash _.words:

var str = 'This is an amazing sentence.';
console.log(_.words(str, /[^, ]+/g));
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.11/lodash.min.js"></script>
Penny Liu
  • 15,447
  • 5
  • 79
  • 98
1

It can be done with split function:

"This is an amazing sentence.".split(' ')