1

I need a way to split a string on spaces but keep quoted substrings intact.

For example:

Input:
str = 'this "is a"test string'

Output:
[this, is a, test, string]

When I use:

str.match(/\\?.|^$/g).reduce((p, c) => {
    if(c === '"' || c === "'"){
        p.quote ^= 1;
    }else if(!p.quote && c === ' '){
        p.a.push('');
    }else{
        p.a[p.a.length-1] += c.replace(/\\(.)/,"$1");
    }
    return  p;
}, {a: ['']}).a

it keeps quoted substrings intact and splits on spaces as required.

However, it won't split the terms correctly in the supplied example, where a quoted substring is immediately followed by a letter. Instead, the result I get is this:

[this, is atest, string]

EDIT:

I believe this question is different to the other similar ones because none of them exclude the quotes and correctly split the terms when there is no space after the closing quote like in this case: 'this "is a"test string'.

robinCTS
  • 5,746
  • 14
  • 30
  • 37
Sam
  • 113
  • 8
  • Possible duplicate of [Split a string by whitespace, keeping quoted segments, allowing escaped quotes](https://stackoverflow.com/questions/4031900/split-a-string-by-whitespace-keeping-quoted-segments-allowing-escaped-quotes) – try-catch-finally Mar 10 '18 at 06:25
  • @try-catch-finally That duplicate is not valid for the same reason as the previous one - nothing there excludes the quotes. (And interestingly, one of the answers is the state machine code the OP is attempting to use.) – robinCTS Mar 10 '18 at 15:09

1 Answers1

3

There are a lot of similar "splitting spaces and quotes" Q&As on SO, most of them with regex solutions. In fact, your code can be found in in at least one of them (thanks for that, try-catch-finally ).

While a few of these solutions exclude the quotes, only one that I could find works if there is no space delimiter following the closing quote, and none of them both exclude quotes and allow for missing spaces.

It is also not just a simple matter of adapting any of the regexes. If you do change the regex to use capturing groups, a simple match method is no longer possible. (The usual technique around this is to use exec in a loop.) If you don't use capturing groups you need to do a string manipulation afterwards to remove the quotes.

The neatest solution is to use map on the array result from the match.

Using the slice string manipulation method:

var str = 'this "is a"test string';
var result = str.match(/"[^"]*"|\S+/g).map(m => m.slice(0, 1) === '"'? m.slice(1, -1): m);
console.log(result);

Using capturing groups:

var str = 'this "is a"test string';
var regex = /"([^"]*)"|(\S+)/g;
var result = (str.match(regex) || []).map(m => m.replace(regex, '$1$2'));
console.log(result);

The capturing group solution is the more general one, easily expandable to allow for different quotes, for example.

Note that the regex used in both solutions above is very simple and only works for double quotes, and no escaped quotes in the sub-strings. (It works fine with nested single quotes and apostrophes, though.)

Explanation for the regex:

  • "[^"]*" → a " followed by any number of non-" characters followed by a "

  • | → or

  • \S+ → any consecutive sequence of non-whitespace characters

Note that the order of the two groups is critical. If \S+ is used first it will match the opening quote together with just the first following word.


As for that state machine code you were attempting to use, it is very restrictive and only works for precisely one space between terms, and breaks if there are any apostrophes used anywhere (because it also allows the sub-strings to be single quoted).

It can be fixed to work for your specific example by pushing an empty string when an end quote is detected. To also allow for a single space after a closing quote, there needs to be a check for an existing empty string before pushing a new one:

var str = 'this "is a"test string';
var result = str.match(/\\?.|^$/g).reduce((p, c) => {
    if(c === '"' || c === "'"){
        if(!(p.quote ^= 1)){p.a.push('');} // <- modified
    }else if(!p.quote && c === ' ' && p.a[p.a.length-1] !== ''){ // <- modified
        p.a.push('');
    }else{
        p.a[p.a.length-1] += c.replace(/\\(.)/,"$1");
    }
    return  p;
}, {a: ['']}).a
console.log(result);
robinCTS
  • 5,746
  • 14
  • 30
  • 37