1

I'm encountering a hurdle in parsing nested blocks.

Specifically, I would like to parse (a small, defined subset of) LaTeX, and I'm having an issue at properly parsing nested \begin{} and \end{} pairs:

\begin{document}

Some text.

\begin{quote}
  A quote.
\end{quote}


\end{document}

The expected parse tree would be something along the lines of:

- env
-  - expression
-  - env
-  -  - expression

But what I get instead is the second \begin{quote} block gets parsed as a command statement, rather than as a block statement:

- env
- - expression
- - command
- - expression
- - command

Here's the grammar:

// Top level rule is `document`.
document = { 
    SOI ~
    (section? ~ newline_char)* ~ section? ~
    EOI
}

section = {
    env_stmt |
    cmd_stmt |
    expression
}

// Expression grammar
expression   = { ( cmd_stmt | literal )* }
literal      = @{ char+ }
char         = @{ ASCII_ALPHANUMERIC | punctuation }
punctuation  = {
    "," | "." | ";" | "(" | ")" | "[" | "]" | "|" | "<" | ">" | ":" 
}

// Control Statement Grammar
cmd_stmt = { ctrl_character ~ name ~ cmd_stmt_opt? ~ "{" ~ expression ~ "}" }
cmd_stmt_opt = { "[" ~ name ~ "]" }

name                    = @{ ASCII_ALPHA+ }
COMMENT                 = _{ "%" ~ (!newline_char ~ ANY)* ~ newline_char }
WHITESPACE              = _{ " " }
newline_char            = _{"\n"}
ctrl_character       = _{ "\\" }

// Environment Grammar
env_stmt     = { env_begin ~ env_content ~ env_end }
env_content  = { (section? ~ newline_char)* }
env_begin    = @{ ctrl_character ~ "begin" ~ "{" ~ PUSH(name) ~ "}" }
env_end      = @{ ctrl_character ~ "end" ~ "{" ~ PEEK ~ "}" }

I'm guessing it has something to do with PUSH/PEEK. When I remove them, only the inner block gets parsed as an environment, the outer one is parsed as a control statement.

Could someone please point me towards what I'm doing wrong?

Jmb
  • 18,893
  • 2
  • 28
  • 55
pierre
  • 151
  • 8

0 Answers0