How can I find code between two braces, respecting nesting?

Question

I am trying to get code between two braces, but still pay attention to nesting. Say I have something like the below as input:

while (true) {                    [A]
    dothis();
    if (whattype() == "A") {      [B]
        doA();
        if (other() == "dog") {   [C]
            doB();
        }                         [D]
    }                             [E]
    if (other() == "cat") {       [F]
        doZ();
    }                             [G]
}                                 [H]

And I want to recursively loop each nesting layer:

while
- if
  - if
- if

The current function takes the string, uses regex (\{([\s\S]*)\}) to greedily find code between the first and final brace and does that again to its contents until there are no more braces in the string.

The problem is the regex does not work for blocks of code next to each other. The regex matches the text between B up until G. It should instead start at B and stop at E, then another block from F to G.

Edit: I may end up using something other than regex. Are there any suggestions on how to handle this?

For future readers:

What I found helpful was this answer from another SO question.

Counting braces/parentheses is one of the classic examples of "things you can't do with regular expressions". — Pointy, Feb 05 '15 at 01:35
@Pointy I may end up using something else then, do you have suggestion on how to do this logic-wise? — aNewStart847, Feb 05 '15 at 01:41
loop through each line. Add 1 to a variable when `{` is reached and sub 1 from the var where `}` is reached. — Avinash Raj, Feb 05 '15 at 01:43
Well what you do depends on what you know about your input. If you know what it will generally look like, and (more important) know that certain things can't happen, then you can use a simple approach like what @AvinashRaj suggests. In the general case, when your code has to deal with *any* source code, your only reliable course of action is to use a full-blown JavaScript parser (assuming that's the language you're working on). — Pointy, Feb 05 '15 at 01:46
I had something similar to what @AvinashRaj suggested at first, but I wasn't able to get it to work recursively. I may try to get something working from that side again though. I got a new approach in mind. Thank you both, — aNewStart847, Feb 05 '15 at 01:48

score 3 · Answer 1 · answered Feb 05 '15 at 02:09

3

This type of problem can't be resolved by a regex that consume a whole block.

What you describe does require complete and correct tokenization of the JavaScript language. Consider for example that you might have brackets contained inside quoted text... Unless you actually see more benefits I'm trying to do it all by yourself than to actually succeed in reasonable time (like you are playing around to understand how parsers works), then you should definitely have a look at some existing JS in JS parsers. See for example: http://marijnhaverbeke.nl/blog/acorn.html (note this is the first result that Google gave me, never tried that library).

answered Feb 05 '15 at 02:09

James

4,211
1
18
34

Though this is part of a larger project, I am also doing this to learn how parsers work. I have looked at [jison](http://zaach.github.io/jison/), but I'll also check around for more parsers. – aNewStart847 Feb 05 '15 at 02:18
Great, language parsing is a very interesting subject. One of my favourite subject actually. But believe me: the JavaScript language is though to parse correctly. Writing a custom parser (and worst, your first ever) for this might compromise your whole project probability of success. So don't mix concerns there... – James Feb 05 '15 at 02:28
On s side note, to get yourself started with writing parsers, you might want to have a look at Eclipse's Xtext. It is based on Java (I don't know if you have experience there). The very interesting facts, particularly for those with little prior knowledge on parser construction. are that a) it cover all aspect of a language definition from a single, well integrated workflow, and b) it is well documented from a language developer point of view, including several video on how to implement various sample languages... – James Feb 05 '15 at 02:39
1

You don't need a real parser to do this. You *do* need the tokens that can make up the languages. With that, you can build a loop that increments/decrements a counter on each { ... } and [ ... ] . – Ira Baxter Feb 05 '15 at 03:14
indeed, as mentioned by Ira Baxter, and that's what I said in my answer, you need complete and correct tokenization of the language, not a full parser. The discussion on complete parsers was only side notes. – James Feb 05 '15 at 12:27

How can I find code between two braces, respecting nesting?

For future readers:

1 Answers1