I'm encountering a hurdle in parsing nested blocks.
Specifically, I would like to parse (a small, defined subset of) LaTeX, and I'm having an issue at properly parsing nested \begin{}
and \end{}
pairs:
\begin{document}
Some text.
\begin{quote}
A quote.
\end{quote}
\end{document}
The expected parse tree would be something along the lines of:
- env
- - expression
- - env
- - - expression
But what I get instead is the second \begin{quote}
block gets parsed as a command statement, rather than as a block statement:
- env
- - expression
- - command
- - expression
- - command
Here's the grammar:
// Top level rule is `document`.
document = {
SOI ~
(section? ~ newline_char)* ~ section? ~
EOI
}
section = {
env_stmt |
cmd_stmt |
expression
}
// Expression grammar
expression = { ( cmd_stmt | literal )* }
literal = @{ char+ }
char = @{ ASCII_ALPHANUMERIC | punctuation }
punctuation = {
"," | "." | ";" | "(" | ")" | "[" | "]" | "|" | "<" | ">" | ":"
}
// Control Statement Grammar
cmd_stmt = { ctrl_character ~ name ~ cmd_stmt_opt? ~ "{" ~ expression ~ "}" }
cmd_stmt_opt = { "[" ~ name ~ "]" }
name = @{ ASCII_ALPHA+ }
COMMENT = _{ "%" ~ (!newline_char ~ ANY)* ~ newline_char }
WHITESPACE = _{ " " }
newline_char = _{"\n"}
ctrl_character = _{ "\\" }
// Environment Grammar
env_stmt = { env_begin ~ env_content ~ env_end }
env_content = { (section? ~ newline_char)* }
env_begin = @{ ctrl_character ~ "begin" ~ "{" ~ PUSH(name) ~ "}" }
env_end = @{ ctrl_character ~ "end" ~ "{" ~ PEEK ~ "}" }
I'm guessing it has something to do with PUSH/PEEK. When I remove them, only the inner block gets parsed as an environment, the outer one is parsed as a control statement.
Could someone please point me towards what I'm doing wrong?