0

I define a nested block as something with separate opening and closing characters. I.e. { and }, [ and ], etc. An algorithm must ignore opening and closing characters if they are enclosed in a delimiter (such as '{' or "{"), or explicitly escaped such as in a comment block.

This is not homework (I'm not a student) or idle speculation. My immediate goal is to alphabetically reorder function declarations in ActionScript code files to assist in debugging / comparing different versions. But the real question and more useful for other readers is the general algorithm as described above. In my case the plug-in parameters are just opening = {, closing = }, delimiter = ", escape = //..[end of line].

Please see the following for existing questions that explain why regular expressions are not an option for parsing arbitrarily deep nested expressions:

The obvious blunt solution is to chug through character-by-character and build a context stack and state variables ("inQuote", "inComment", etc). I've done this before. I'm just wondering if there is a more formal or efficient solution; or if this is irreducable.

Community
  • 1
  • 1
Joshua Honig
  • 12,925
  • 8
  • 53
  • 75
  • 2
    Regarding more "formal" solution: looks like this is parsing task. Take a look at Boost Spirit library. I am not sure, however, that it performs better than home-made (and possibly less general) parsing algorithm. – Alex F Sep 26 '11 at 13:15
  • There's no way to avoid iterating over every character, because you have to check if each character is the start/end of a block. Given that, any other solution is going to be better by at best a constant factor. – Nick Johnson Sep 27 '11 at 01:50

1 Answers1

1

Depending on what you want to do you can either go with your context stack solution or you can build expression trees from the statements. From what I'm reading it looks like the context stack is the easiest to use if you just want some simple parsing but if you require complicated expression handling or just multi level expression handling then you may want to consider an expression tree otherwise known as a parse tree (http://en.wikipedia.org/wiki/Parse_tree). In your case the tree will not get very complicated if done right. As I said though from what I read it shouldn't be complicated as long as you can get the entire function block and encapsulate that somehow and just sort by name.

Jesus Ramos
  • 22,940
  • 10
  • 58
  • 88