Splitting a string by spaces unless spaces are within curly or square brackets at the shallowest level

Question

I want to separate a string into an array based on spaces, with the caveat that spaces within a pair of curly or square brackets should be ignored.

I was able to find some answers that are close to what I want here and here, but they don't handle brackets nested within other brackets.

How do I split this string:

foo bar["s 1"]{a:{b:["s 2", "s 3"]}, x:" [s 4] "} woo{c:y} [e:{" s [6]"}] [simple square bracket] {simple curly bracket}

Into this array?

["foo", "bar[\"s 1\"]{a:{b:[\"s 2\", \"s 3\"]}, x:\" [s 4] \"}", "woo{c:y}", "[e:{\" s [6]\"}]", "[simple square bracket]", "{simple curly bracket}"]

When using the regex from the first link, I modified the regular expression to work with square and curly brackets, and got the correct output for the simple, un-nested parts of the example, but not for the complex nested area. See here.

The second link's answers relied on JSON formatting with colons, and it doesn't apply because my input will not necessarily be valid JSON and it also doesn't have a similar character pattern to adapt the answer to.

According to a commenter, this may not possible to do with regular expressions. Even if that is the case, any way of splitting the string to achieve the desired result would be considered a correct answer.

Hi there. When learning how to work with regular expressions, your best bet is to start writing. By searching for others' expressions and trying to shoe-horn them into your application, you aren't likely to learn much. Try to extract something useful, learn from your attempt, then try to improve upon it. Ask questions when you're stumped on a specific issue, and show your work. Folks here are more than willing to help out, but without some effort on your part you're unlikely to get much help. After all, you get out what you put in. — Fissure King, Jul 10 '18 at 01:53
I've spent the past 30-40 minutes on this issue, and I'm not completely unfamiliar with regular expressions. I considered adding examples of why the two links I included did not work, but decided against it for brevity. Should I add those examples to prove that I put effort into this question? — getfugu, Jul 10 '18 at 01:56
@getfugu Yes, please add your previous attempts to help us understand what you have already tried and where you got stuck. — MSB, Jul 10 '18 at 01:57
@getfugu, Sorry, I didn't mean to suggest you hadn't put any effort into it, and it's certainly not about proving you've tried. I meant to suggest that specific problems are better than general ones, e.g., "I expected `x` to result in `y`, but instead observed `z`", rather than "How do I get to `y`?". — Fissure King, Jul 10 '18 at 01:59
Thanks for the input, the additional steps of attempted problem solving have been added. — getfugu, Jul 10 '18 at 02:15
Well, it is simple, you can't. Not with regular expressions. Regular expressions match regular languages and yours are not. You need to tokenize and parse it thus. — Antti Haapala -- Слава Україні, Jul 10 '18 at 04:04

score 1 · Accepted Answer · answered Jul 10 '18 at 05:31

Regular expressions are great for certain things. But if you wish to support arbitrarily deeply nested expressions, then regular expressions aren't really the right tool for the job.

Instead, consider the following approach which uses a stack to track beginnings and endings of bracketed expressions:

Sample code

function getfugu_split(input) {
  var i = 0, stack = [], parts = [], part = '';
  while(i < input.length) {
    var c = input[i]; i++;  // get character
    if (c == ' ' && stack.length == 0) {
      parts.push(part.replace(/"/g, '\\\"'));  // append part
      part = '';  // reset part accumulator
      continue;
    }
    if (c == '{' || c == '[') stack.push(c);  // begin curly or square brace
    else if (c == '}' && stack[stack.length-1] == '{') stack.pop();  // end curly brace
    else if (c == ']' && stack[stack.length-1] == '[') stack.pop();  // end square brace
    part += c; // append character to current part
  }
  if (part.length > 0) parts.push(part.replace(/"/g, '\\\"'));  // append remaining part
  return parts;
}

Example usage

getfugu_split('foo bar["s 1"]{a:{b:["s 2", "s 3"]}, x:" [s 4] "} woo{c:y} [e:{" s [6]"}] [simple square bracket] {simple curly bracket}')

Output

["foo", "bar[\"s 1\"]{a:{b:[\"s 2\", \"s 3\"]}, x:\" [s 4] \"}", "woo{c:y}", "[e:{\" s [6]\"}]", "[simple square bracket]", "{simple curly bracket}"]

Note that the above code almost certainly won't handle every possible requirement you may have or edge case you're likely to encounter. (e.g. Imbalanced square/curly braces may not be handled the way you'd expect.) But if you understand what it's doing, then you should be able to adapt it to suit your needs. I hope this helps! :)

That's more involved than I thought it would be, and thank you for the comments explaining. Regular expressions don't understand context (in my case, nesting), so now it makes sense why the counting system is needed. — getfugu, Jul 10 '18 at 06:46
I'm glad that helped you, @getfugu. It's been my experience that choosing the right approach / tools for the task at hand -- or at least choosing the best one for you and your understanding -- can be one of the hardest parts of tackling a new, challenging problem. And sometimes, there doesn't even seem to be one "best" approach. The more problems (and the greater the variety) you try to face and solve, the better you'll get. Just keep challenging yourself, and you'll be able to come up with stuff like that^ (and much better) much sooner than you think! — porcus, Jul 10 '18 at 14:05

Splitting a string by spaces unless spaces are within curly or square brackets at the shallowest level

1 Answers1

Sample code

Example usage

Output