25

I have a string like

 "asdf a  b c2 "

And I want to split it into an array like this:

["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

Using string.split(" ") removes the spaces, resulting in this:

["asdf", "a", "", "b", "c2"]

I thought of inserting extra delimiters, e.g.

string.replace(/ /g, "| |").replace(/||/g, "|").split("|");

But this gives an unexpected result.

gandalf3
  • 1,636
  • 4
  • 24
  • 40

5 Answers5

23

Instead of splitting, it might be easier to think of this as extracting strings comprising either the delimiter or consecutive characters that are not the delimiter:

'asdf a  b c2 '.match(/\S+|\s/g)
// result: ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]
'asdf a  b. . c2% * '.match(/\S+|\s/g)
// result: ["asdf", " ", "a", " ", " ", "b.", " ", ".", " ", "c2%", " ", "*", " "]

A more Shakespearean definition of the matches would be:

'asdf a  b c2 '.match(/ |[^ ]+/g)

To or (not to )+.

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • @Jack I hadn't, but that seems to work! Clearly, I need to learn regular expressions.. What does \S+ mean? – gandalf3 Jul 01 '14 at 07:08
  • 2
    @gandalf3 `\S` is the opposite of `\s` .. it could also be written as `[^\s]`. – Ja͢ck Jul 01 '14 at 07:09
  • +1 but note: wrapping it in a non-capturing group (`(?: )`) is not necessary. `'asdf a b c2 '.match(/\S+|\s/g)` would be the same – p.s.w.g Jul 01 '14 at 18:41
10

Use positive lookahead:

"asdf a  b c2 ".split(/(?= )/)
// => ["asdf", " a", " ", " b", " c2", " "]

Post-edit EDIT: As I said in comments, the lack of lookbehind makes this a bit trickier. If all the words only consist of letters, you can fake lookbehind using \b word boundary matcher:

"asdf a  b c2 ".split(/(?= )|\b/)
// => ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

but as soon as you get some punctuation in, it breaks down, since it does not only break on spaces:

"asdf-eif.b".split(/(?= )|\b/)
// => ["asdf", "-", "eif", ".", "b"]

If you do have non-letters you don't want to break on, then I will also suggest a postprocessing method.

Post-think EDIT: This is based on JamesA's original idea, but refined to not use jQuery, and to correctly split:

function chop(str) {
  var result = [];
  var pastFirst = false;
  str.split(' ').forEach(function(x) {
    if (pastFirst) result.push(' ');
    if (x.length) result.push(x);
    pastFirst = true;
  });
  return result;
}
chop("asdf a  b c2 ")
// => ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • This works great for what I wrote in my question, but I just realized I made a mistake with the examples.. See my edited question. – gandalf3 Jul 01 '14 at 06:41
  • @gandalf3 you want them not as strings? – Henrik Andersson Jul 01 '14 at 06:42
  • @limelights I want each space to be in a single element. There should never be a space + anything else in one element. – gandalf3 Jul 01 '14 at 06:43
  • 1
    @limelights: Originally the split was before each space; now it is before and after each space. Unfortunately, JavaScript does not have lookbehind, so this is a bit harder... – Amadan Jul 01 '14 at 06:43
  • Thanks! This works great, but accepted Jack's answer because it's shorter (though that solution does split on any whitespace character, not just spaces. But it's fine for my case). I would accept both if I could.. (+1 btw) – gandalf3 Jul 01 '14 at 07:48
8

I'm surprised no one has mentioned this yet, but I'll post this here for the sake of completeness. If you have capturing groups in your expression, then .split will include the captured substring as a separate entry in the result array:

"asdf a  b c2 ".split(/( )/)  // or /(\s)/
// ["asdf", " ", "a", " ", "", " ", "b", " ", "c2", " ", ""]

Note, this is not exactly the same as the desired output you specified, as it includes an empty string between the two contiguous spaces and after the last space.

If necessary, you can filter out all empty strings from the result array like this:

"asdf a  b c2 ".split(/( )/).filter(String)
// ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

However, if this is what you're looking for, I'd probably recommend you go with @Jack's solution.

Community
  • 1
  • 1
p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
1

Try clean-split:

const cleanSplit = require("clean-split");

cleanSplit("a-b-c", "-");
//=> ["a", "-", "b", "-", "c"]

cleanSplit("a-b-c", "-", { anchor: "before" });
//=> ["a-", "b-", "c"]

cleanSplit("a-b-c", "-", { anchor: "after" });
//=> ["a", "-b", "-c"]

Under the hood, it uses logic adapted from:

In your case, you can do something like this:

const cleanSplit = require("clean-split");

cleanSplit("asdf a  b c2 ", " ");
//=> ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]
Richie Bendall
  • 7,738
  • 4
  • 38
  • 58
0

You could use a little jQuery

var toSplit = "asdf a  b c2 ".split(" ");
$.each(toSplit, 
    function(index, value) { 
        if (toSplit[index] == '') { toSplit[index] = ' '} 
    }
);

This will create the output you are looking for without the leading spaces on the other elements.

JamesA
  • 365
  • 3
  • 11