I'm working on a JavaScript-based project that involves a rudimentary Bash-inspired scripting system, and I'm using a regex to separate lines into (a number of types of) tokens.
One such token class is of course the recursive $()
construct. This construct can be nested arbitrarily. I am trying to devise a JavaScript regular expression to match this type of token without accidentally leaving parts behind or grabbing parts of other tokens.
The problem, more specifically:
Given a string such as this example:
"$(foo $(bar) fizz $(buzz)) $(something $(else))"
which is made up of individual tokens, each delimited by an outer $() and followed either by whitespace or end-of-string,
match the first such token in the string from and including its open $( to and including its final closing )
In the event that an unbalanced construct occurs anywhere in the string, the behavior of this regex is considered undefined and does not matter.
So in the above example the regex should match "$(foo $(bar) fizz $(buzz))"
Further usage details:
Both the input string and the returned match are passed through String.prototype.trim()
, so leading and trailing whitespace doesn't matter
I am able to deem unbalanced constructs an undefined case because the code that consumes this type of token once extracted does its own balance checking. Even if the regex returns a match that is surrounded by an outer $(), the error will eventually be caught elsewhere.
What I've tried so-far
For a while I've been using this regex:
/\$\(.*?\)(?!(?:(?!\$\().)*\))(?:\s+|$)/
which seemed to work for quite some time. It matches arbitrarily nested balanced constructs so long as they do not have multiple nests at the same level. That case just slipped past my mind somehow when testing initially. It consumes the contents of the token with lazy repetition, and asserts that after the closing paren there is not another close paren before there is an open $(. This is of course broken by tokens such as the example above.
I know that the traditional "balanced constructs regex problem" is not solvable without subroutines/recursion. However I'm hoping that since I only need to match balanced constructs and not fail to match imbalanced ones, that there's some clever way to cheat here.