I define a nested block as something with separate opening and closing characters. I.e. {
and }
, [
and ]
, etc. An algorithm must ignore opening and closing characters if they are enclosed in a delimiter (such as '{'
or "{"
), or explicitly escaped such as in a comment block.
This is not homework (I'm not a student) or idle speculation. My immediate goal is to alphabetically reorder function declarations in ActionScript code files to assist in debugging / comparing different versions. But the real question and more useful for other readers is the general algorithm as described above. In my case the plug-in parameters are just opening = {
, closing = }
, delimiter = "
, escape = //..[end of line]
.
Please see the following for existing questions that explain why regular expressions are not an option for parsing arbitrarily deep nested expressions:
- Can regular expressions be used to match nested patterns?
- How can I parse nested blocks using Regex? [closed]
The obvious blunt solution is to chug through character-by-character and build a context stack and state variables ("inQuote", "inComment", etc). I've done this before. I'm just wondering if there is a more formal or efficient solution; or if this is irreducable.