8

The goal is to split a string at the spaces but not split the text data that is in quotes or separate that from the adjacent text.

The input is effectively a string that contains a list of value pairs. If the value value contains a space it is enclosed in quotes. I need a function that returns an array of value-pair elements as per the example below:

Example Input:

'a:0 b:1 moo:"foo bar" c:2'

Expected result:

a:0,b:1,moo:foo bar,c:2 (An array of length 4)

I have checked through a load of other questions but none of them (I found) seem to cope with my issue. Most seem to split at the space within the quotes or they split the 'moo:' and 'foo bar' into separate parts.

Any assistance would be greatly appreciated, Craig

Crog
  • 1,112
  • 8
  • 16
  • 4
    possible duplicate of [Regex to pick commas outside of quotes](http://stackoverflow.com/questions/632475/regex-to-pick-commas-outside-of-quotes) – Avinash Raj Sep 04 '14 at 11:08
  • just replace the comma with a space in the above link. – Avinash Raj Sep 04 '14 at 11:09
  • 2
    Above link doesn't do what is intended, it does a replace and not a split. – Crog Sep 05 '14 at 12:42
  • There are numerous solutions but I have accepted Moob's solution as it fits perfectly into the problem scenario and actually improves the situation by removing the necessity to have quotes around values so enhances the system. – Crog Sep 05 '14 at 12:43
  • you could use the same regex for splitting also. – Avinash Raj Sep 05 '14 at 12:46
  • 1
    Does this answer your question? [javascript split string by space, but ignore space in quotes (notice not to split by the colon too)](https://stackoverflow.com/questions/16261635/javascript-split-string-by-space-but-ignore-space-in-quotes-notice-not-to-spli) – ggorlen Oct 15 '20 at 17:48

3 Answers3

14

You can use this regex for split:

var s = 'a:0 b:1 moo:"foo bar" c:2';

var m = s.split(/ +(?=(?:(?:[^"]*"){2})*[^"]*$)/g);
//=> [a:0, b:1, moo:"foo bar", c:2]

RegEx Demo

It splits on spaces only if it is outside quotes by using a positive lookahead that makes sure there are even number of quotes after a space.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 2
    +1, this is awesome, I would really like a more thorough explanation though :) – epoch Sep 04 '14 at 11:05
  • Thanks @epoch: This works on the assumption that quotes are balanced and unescaped. If a space is outside quotes then there will always be 0 or even number of quotes following the space till end of line. That is exactly what this lookahead `(?=(?:(?:[^"]*"){2})*[^"]*$)` is doing. – anubhava Sep 04 '14 at 11:11
  • Wow, that is amazing. None of the other regex i found worked as well as your answer anubhava. Wish we would understand more about how it works. Also is it possible to remove the quotes as per the example? That's nothing major though. – Crog Sep 05 '14 at 12:26
  • But you still chose to accept another answer which doesn't exactly do what you asked in the question and that also returns `moo:"foo bar"` in the final result. – anubhava Sep 05 '14 at 13:47
  • So looks like I didn't respond in a long time and was a bit confused myself. My comment for justification went under the original post for some reason at the time: "I have accepted Moob's solution as it fits perfectly into the problem scenario and actually improves the situation by removing the necessity to have quotes around values" – Crog Jan 21 '20 at 11:24
  • May be I am reading that solution differently but `str.split(/ +(?=[\w]+\:)/g)` does not remove quotes from final result. And did you apply that regex for an input of `'a:0 b:1 moo-shu:"foo bar" c:2'`? – anubhava Jan 21 '20 at 14:05
  • 1
    Warning, the `/ +(?=(?:(?:[^"]*"){2})*[^"]*$)/` is a very slow pattern, and if you have long string, consider using a different expression if you do not want to experience slowdowns (as for example [described here](https://stackoverflow.com/q/62150217/3832970)). – Wiktor Stribiżew Jun 02 '20 at 10:51
4

You could approach it slightly differently and use a Regular Expression to split where spaces are followed by word characters and a colon (rather than a space that's not in a quoted part):

var str = 'a:0 b:1 moo:"foo bar" c:2',
    arr = str.split(/ +(?=[\w]+\:)/g);
/* [a:0, b:1, moo:"foo bar", c:2] */

Demo jsFiddle

What's this Regex doing?
It looks for a literal match on the space character, then uses a Positive Lookahead to assert that the next part can be matched:
[\w]+ = match any word character [a-zA-Z0-9_] between one and unlimited times.
\: = match the : character once (backslash escaped).
g = global modifier - don't return on first match.

Demo Regex101 (with explanation)

Moob
  • 14,420
  • 1
  • 34
  • 47
  • This is much shorter than anubhavas regex. I don't understand enough about them to know but seems to do the same job, others have upvoted the other solution however but neither omit the quotes from the result for now but that's no biggie. – Crog Sep 05 '14 at 12:37
  • Reading again, I see the elegance of your solution. This fits perfectly into the problem description perfectly. It actually makes the use of quotes in the value text obsolete. – Crog Sep 05 '14 at 12:40
  • I have gone with this, can I somehow make the space and ' ' & ':' a variable in the code, I am not sure how to add that into this sort of regex. – Crog Sep 05 '14 at 12:44
2

Any special reason it has to be a regexp?

var str = 'a:0 b:1 moo:"foo bar" c:2';

var parts = [];
var currentPart = "";
var isInQuotes= false;

for (var i = 0; i < str.length, i++) {
  var char = str.charAt(i);
  if (char === " " && !isInQuotes) {
    parts.push(currentPart);
    currentPart = "";
  } else {
    currentPart += char;
  }
  if (char === '"') {
    isInQuotes = !isInQuotes;
  }
}

if (currentPart) parts.push(currentPart);
RoToRa
  • 37,635
  • 12
  • 69
  • 105