1

I'm trying to create a regex that will reveal two sub strings within the given string and the returned value should be an array containing two elements, the two matched strings. I understand that my problem is closely linked to palindrome's which cannot be implemented as regex, but I'm hoping there is regex that will get close enough as there is a finite size structure I'm expecting to read.

To be very specific, I only care about matching the two top-level children as in the first example, any number of nested brackets inside don't matter at all, whether there is 1 or 99999 of them.

Note that spacing is just for easier readability and the input string will have no spaces. This structure is simply:

{ }{ }

and should be accepted as two strings:

{ } and { }

Contained within this, there can be any number of groupings of braces:

{ {} {} {} {} {} {} }{ {} }

and should be accepted as two strings:

{ {} {} {} {} {} {} } and { {} }

Contained within any of these inner groupings of braces can just be infinite recursive groupings like:

{{{{ }{{ }}{ }}}}{{ }{ }{ }}

and should be accepted as two strings:

{{{{ }{{ }}{ }}}} and {{ }{ }{ }}

I've thought of this problem for quite a while by myself and couldn't come up with a proper solution and there aren't any tools online I've found that have a visual way to see these two substrings, it always just matches the whole string. I've also used some regex creators like "http://regex.inginf.units.it/" and gave it maximum number of strings and all possible edge cases, etc, but only got like 40% accuracy. I'm hoping someone smarter than me on the subject that can come up with a regex to fit the answers to the bottom 7 examples and any other possible string constructed from the rules above.

I made a simple html to test my strings (just edit the "reg" variable in script tag to change your regex and view results with refreshing page:

var reg = /({({.*})*})/g;
var str1 = "{}{}";
var str2 = "{{}{}}{{}}";
var str3 = "{{{{{}{}{}{}}{{}}}}{}}{}";
var str4 = "{{{{{{{{{{{{{{{{{}}{{}}}}}{{}}}}}{{}}}}}{{}}}}}{{}}}}}{{}}";
var str5 = "{{}{{{{{{}{}}}}{{{{}{}}}{}}}}{}{{{}{{}}}}}{{{{{}}{{{{}{}}}}}}{{{{}}{{{{}{}}}}}}}";
var str6 = "{{}{}}{{}{{{}{}}}}";
var str7 = "{{}{}}{{{{{}}{{}}}}{{{}{}}}}";
var s1 = document.getElementById("d1").innerHTML = str1.match(reg);
var s2 = document.getElementById("d2").innerHTML = str2.match(reg);
var s3 = document.getElementById("d3").innerHTML = str3.match(reg);
var s4 = document.getElementById("d4").innerHTML = str4.match(reg);
var s5 = document.getElementById("d5").innerHTML = str5.match(reg);
var s6 = document.getElementById("d6").innerHTML = str6.match(reg);
var s7 = document.getElementById("d7").innerHTML = str7.match(reg);
<p id="d1"></p>
<p id="ans1">{},{}</p>
<p id="d2"></p>
<p id="ans2">{{}{}},{{}}</p>
<p id="d3"></p>
<p id="ans3">{{{{{}{}{}{}}{{}}}}{}},{}</p>
<p id="d4"></p>
<p id="ans4">{{{{{{{{{{{{{{{{{}}{{}}}}}{{}}}}}{{}}}}}{{}}}}}{{}}}}},{{}}</p>
<p id="d5"></p>
<p id="ans5">{{}{{{{{{}{}}}}{{{{}{}}}{}}}}{}{{{}{{}}}}},{{{{{}}{{{{}{}}}}}}{{{{}}{{{{}{}}}}}}}</p>
<p id="d6"></p>
<p id="ans6">{{}{}},{{}{{{}{}}}}</p>
<p id="d7"></p>
<p id="ans7">{{}{}},{{{{{}}{{}}}}{{{}{}}}}</p>
Avionix
  • 47
  • 8
  • 1
    Not possible with JS's native regexes. While tools may exist, you may well find it easier to use a non-regex solution. – CertainPerformance Feb 17 '19 at 08:20
  • 1
    What you are attempting to do is not what regex is meant to do. What you need is a parser – smac89 Feb 17 '19 at 08:26
  • Possible duplicate of [Can regular expressions be used to match nested patterns?](https://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns) – smac89 Feb 17 '19 at 08:32
  • Reason why it isn't a duplicate is because I only care about matching the two top-level brackets, no other amount of nesting will change that matching in any way. This is why my example regex includes " .* ", because whats inside the initial brackets is meaningless. – Avionix Feb 17 '19 at 08:39
  • But that would still require the regex engine to remember what level of nesting it is currently in, which regex can't do. – Sweeper Feb 17 '19 at 08:41
  • Do your strings contain anything else than those two characters? – trincot Feb 17 '19 at 08:48
  • Yes, the full implementation has info inside or outside any set of brackets, but I cut that out because it didn't matter, only the base structure of 2 sets of brackets I care to match. – Avionix Feb 17 '19 at 08:52
  • Are the strings guaranteed to be formatted correctly, with balanced braces and exactly two top-level groups? – trincot Feb 17 '19 at 08:55
  • Yes that is completely guaranteed. – Avionix Feb 17 '19 at 08:56

1 Answers1

3

Regex is not suitable for this task (at least the JS flavour isn’t). Anything that involves structures that can be arbitrarily nested is not suitable to be matched with regex. This is why they say you should not use regex to parse HTML or JSON. See this answer for more info.

The string you have here is quite simple to parse without using regex. By using regex you are kind of making life hard for yourself.

Here's how to parse this string (assuming the brackets are always balanced):

  • loop through the string
  • if you encounter an open brace, add one to a counter variable
  • if you encounter a closing brace, minus one
  • The first time the counter variable reaches 0, that is the end of the first substring, and the rest is the second substring.
Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • I've already been using this exact implementation while I figured out the regex if there was one. – Avionix Feb 17 '19 at 08:44
  • @Avionix Why are you so fixated on regex? Regex would be a lot less efficient. – Sweeper Feb 17 '19 at 08:45
  • I had the impression that regex would be able to speed up a process like this. I understand just going through the string is O(n) time. – Avionix Feb 17 '19 at 08:47
  • @Avionix If you read the second paragraph of the second link, you'll see that regex is much much slower. And the nested braces _do_ matter even if you think they don't. You need them to know which level of nesting you are currently at. – Sweeper Feb 17 '19 at 08:50
  • Alright, I'm just going to accept this answer since it's already what I have implemented. Thanks. – Avionix Feb 17 '19 at 08:56
  • Some regex engines can handle nested structures just fine with recursive patterns, like in PHP - JS just doesn't happen to be one of them. While certain common types of nested structures are better parsed with parsers built for that sort of thing (like with HTML and JSON), I don't think that saying that regex should *never* be used for such a thing is wise. For example, the (very concise) pattern `{(?:(?R)| )*}` matches one of OP's nested structures, and I would much prefer using that pattern over manual index parsing of the string if the environment supported such a pattern. – CertainPerformance Feb 17 '19 at 22:16