1

This is slight variant of a common question: how do you split a string by whitespace, unless that whitespace is contained within a pair of quotes (either " or ')? There are a lot of questions like this here, and the best answer I've found so far is this one. The problem is, all these answer include the quotes themselves in the matches. For instance:

"foo bar 'i went to a bar'".match(/[^\s"']+|"([^"]*)"|'([^']*)'/g);

Results in:

["foo", "bar", "'i went to a bar'"]

Is there a solution that results in:

["foo", "bar", "i went to a bar"]

Note there is an edge case around this:

"foo bar \"'Hi,' she said, 'how are you?'\"".match(...);
=> // ["foo", "bar", "'Hi,' she said, 'how are you?'"]

That is to say, a substring should be able to include quotations of its own, which means that aggressively doing something like this won't work:

"foo bar \"'Hi,' she said, 'how are you?'\"".match(...).map(function(string) {
  return string.replace(/'|"/g, '');
});

Update:

We can basically get it working with this:

"foo bar \"'Hi,' she said, 'how are you?'\"".match(/[^\s"']+|"([^"]*)"|'([^']*)'/g).map(function(string) {
    return string.replace(/^('|")|('|")$/g, '');
});

But that's quite ugly. (And it will also break an edge case like "5ft 5feet 5'".) There's gotta be a way to shrink that to a single regex, right?

Community
  • 1
  • 1
nullnullnull
  • 8,039
  • 12
  • 55
  • 107
  • not using regex, but what if you just counted the number of `'` and if there are only two then you could use `.trim("'")` i doubt this is the solution - just getting ideas out there – Anthony Stringer Jun 16 '16 at 18:19
  • Thanks for the idea! One edge case around this might be: `"foo bar \"my mother's bread\""` – nullnullnull Jun 16 '16 at 18:26
  • Or rather: "foo bar \"mother's bread, father's lead\"" – nullnullnull Jun 16 '16 at 18:42
  • @AnthonyStringer That does give me an idea, though. I've got something working now (it's documented in the update above), but I still think there's gotta be a way to turn this into a single regex. – nullnullnull Jun 16 '16 at 18:51

2 Answers2

2

Your regex is good enough. You just need to loop through the matches and pick the correct captured group:

var re = /'([^'\\]*(?:\\.[^'\\]*)*)'|"([^"\\]*(?:\\.[^"\\]*)*)"|[^\s"']+/g;
var arr = ['foo bar "\'Hi,\' she said, \'how are you?\'"',
  'foo bar \'i went to a bar\'',
  'foo bar \'"Hi," she said, "how are you?"\'',
  '\'"Hi," she \\\'said\\\', "how are you?"\''
];

for (i = 0; i < arr.length; i++) {
  var m;
  var result = [];
  while ((m = re.exec(arr[i])) !== null) {
    if (m.index === re.lastIndex)
      re.lastIndex++;
    result.push(m[1] || m[2] || m[0])
  }
  console.log(result)
}
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • This works great! If I wanted to add escaped quotes, how would I do that? That is, how would we get this to work: `'\'"Hi," she \\\'said\\\', "how are you?"\''` – nullnullnull Jun 16 '16 at 22:43
  • 1
    Escaped quotes can be supported but regex will become pretty complex. I will add an update to my answer in few hours when I get back to my computer. – anubhava Jun 17 '16 at 12:09
1

Quoted strings are always fun. You need to test for even or odd numbers of escape characters to know when to terminate the string.

function quotedSplit(str) {
    let re = /'((?:(?:(?:\\\\)*\\')|[^'])*)'|"((?:(?:(?:\\\\)*\\")|[^"])*)"|(\w+)/g,
        arr = [],
        m;
    while(m = re.exec(str))
        arr.push(m[1] || m[2] || m[3]);

    return arr;
}

quotedSplit("fizz 'foo \\'bar\\'' buzz" + ' --- ' + 'fizz "foo \\"bar\\"" buzz');
// ["fizz", "foo \'bar\'", "buzz", "fizz", "foo \"bar\"", "buzz"]

Here, the first two matches will find quoted strings, the third match is a "word"

Paul S.
  • 64,864
  • 9
  • 122
  • 138