5

This is a direct follow up of my previous question where I got the following Regex;

const matches = text.match(/(?:\([^()]*(?:\([^()]*\)[^()]*)*\)|[^,])+/g);

From this a,(b, b),c (aaa, (bbb, cccc, ddd)),d I get

a
(b, b)
c (aaa, (bbb, cccc, ddd))
d

But it fails when I have the following case a,(b, b),c (aaa, ((b b), cccc, ddd)),d where there are 3 nested parentheses which is logical after dissecting how the Regex works.

I tried to update it to consider another level of parentheses and I did the following

const matches = text.match(/(?:\([^()]*(?:\([^()]*(?:\([^()]*\)[^()]*)*\)[^()]*)*\)|[^,])+/g)

It works (online demo) but I am not sure if it's the optimal solution. I also don't know if it will cover all the cases. Can anyone confirm? or maybe there is a better Regex.

I am also looking for a way to generate such Regex for a giving number of parentheses. I have it for 2 and 3 but what about N? Will it work if I always repeat the following part (?:\([^()]*\)[^()]*)* recursively? I know Regex cannot handle any number of nested parentheses but I am not looking for this. I want for a giving number to generate the regex (using JS) and use it.

Temani Afif
  • 245,468
  • 26
  • 309
  • 415
  • 1
    Why regex? This seems like it wants a parser. – Dave Newton Apr 13 '23 at 20:43
  • @DaveNewton if it's a native JS functionnality that can be better than Regex, why not. I am open to it. – Temani Afif Apr 13 '23 at 20:46
  • 1
    *"Will it work if I always repeat the following part"*: yes, as long as you have done it enough number of times. – trincot Apr 13 '23 at 20:58
  • 1
    Very related no regexp approach: https://stackoverflow.com/questions/75892643/how-can-i-get-the-outer-only-angle-brackets-nodes-algorithm/75892885#75892885 – Kosh Apr 13 '23 at 21:32

2 Answers2

4

See this answer below JS section > Without Recursion. I just added a JS snippet there to generate pattern for a chosen max depth. Modified the snippet to your needs (non-commas before/after).

// JS-Snippet to generate pattern
function generatePattern()
{
  // Set max depth & pattern parts -> build pattern
  let depth = document.getElementById("maxDepth").value;
  let p = ['\\([^)(]*(?:','\\([^)(]*\\)','[^)(]*)*\\)'];
  console.log('(?:' + p[0].repeat(depth) + p[1] + p[2].repeat(depth) + '|[^,])+');
} generatePattern();
Max depth = <input type="text" id="maxDepth" size="1" value="2"> 
<input type="submit" onclick="generatePattern()" value="generate pattern!">

The example you provided looks to me like an optimal solution (covers 2 levels nesting). It already contains an unrolled version and is efficient. I doubt there are many options for more optimization.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • Though this looks like a nice answer, the complexity of the regex makes it pretty hard to understand. – 3limin4t0r Apr 14 '23 at 13:35
  • @3limin4t0r There are plenty of resources available in other answers already for further reading. It's hard to cover this here, I thought more about just generating the pattern. – bobble bubble Apr 14 '23 at 13:58
4

Most of the time a single regex isn't the right tool when you need to track nesting depth. For these cases you'll probably want to use the programming language to parse the string.

For this scenario a simple parser could look like this:

function parseArgString(string) {
  const args = [];
  let argStartIndex = 0;
  let depth = 0;
  
  for (let index = 0; index < string.length; ++index) {
    const char = string[index];
    
    if (char == "(") depth += 1;
    if (char == ")") depth -= 1;
    if (depth < 0) throw new Error('unexpected ")" character');
    
    if (char == "," && !depth) {
      args.push(string.slice(argStartIndex, index).trim());
      argStartIndex = index + 1;
    }
  }
  
  const finalArg = string.slice(argStartIndex).trim();
  if (finalArg.length) args.push(finalArg);
  
  return args;
}

const argString = "a,(b, b),c (aaa, ((b b), cccc, ddd)),d";
console.log(parseArgString(argString));
3limin4t0r
  • 19,353
  • 2
  • 31
  • 52