93

I've recently being trying to teach myself how parsers (for languages/context-free grammars) work, and most of it seems to be making sense, except for one thing. I'm focusing my attention in particular on LL(k) grammars, for which the two main algorithms seem to be the LL parser (using stack/parse table) and the Recursive Descent parser (simply using recursion).

As far as I can see, the recursive descent algorithm works on all LL(k) grammars and possibly more, whereas an LL parser works on all LL(k) grammars. A recursive descent parser is clearly much simpler than an LL parser to implement, however (just as an LL one is simpler than an LR one).

So my question is, what are the advantages/problems one might encounter when using either of the algorithms? Why might one ever pick LL over recursive descent, given that it works on the same set of grammars and is trickier to implement?

user200783
  • 13,722
  • 12
  • 69
  • 135
Noldorin
  • 144,213
  • 56
  • 264
  • 302
  • Hey, can you please explain. You write _seem to be the LL parser (using stack/parse table) and the Recursive Descent parser (simply using recursion)._. [This answer](https://stackoverflow.com/a/45977727/2545680) states that all top-down parsers are LL parsers. I'm confused – Max Koretskyi Aug 31 '17 at 16:15
  • 2
    @MaximKoretskyi That's definitely not true. LL parsers are a subset of top-down parsers. – Noldorin Aug 31 '17 at 18:07
  • thanks, can you maybe please post your answer to my question? – Max Koretskyi Aug 31 '17 at 18:52

1 Answers1

120

LL is usually a more efficient parsing technique than recursive-descent. In fact, a naive recursive-descent parser will actually be O(k^n) (where n is the input size) in the worst case. Some techniques such as memoization (which yields a Packrat parser) can improve this as well as extend the class of grammars accepted by the parser, but there is always a space tradeoff. LL parsers are (to my knowledge) always linear time.

On the flip side, you are correct in your intuition that recursive-descent parsers can handle a greater class of grammars than LL. Recursive-descent can handle any grammar which is LL(*) (that is, unlimited lookahead) as well as a small set of ambiguous grammars. This is because recursive-descent is actually a directly-encoded implementation of PEGs, or Parser Expression Grammar(s). Specifically, the disjunctive operator (a | b) is not commutative, meaning that a | b does not equal b | a. A recursive-descent parser will try each alternative in order. So if a matches the input, it will succeed even if b would have matched the input. This allows classic "longest match" ambiguities like the dangling else problem to be handled simply by ordering disjunctions correctly.

With all of that said, it is possible to implement an LL(k) parser using recursive-descent so that it runs in linear time. This is done by essentially inlining the predict sets so that each parse routine determines the appropriate production for a given input in constant time. Unfortunately, such a technique eliminates an entire class of grammars from being handled. Once we get into predictive parsing, problems like dangling else are no longer solvable with such ease.

As for why LL would be chosen over recursive-descent, it's mainly a question of efficiency and maintainability. Recursive-descent parsers are markedly easier to implement, but they're usually harder to maintain since the grammar they represent does not exist in any declarative form. Most non-trivial parser use-cases employ a parser generator such as ANTLR or Bison. With such tools, it really doesn't matter if the algorithm is directly-encoded recursive-descent or table-driven LL(k).

As a matter of interest, it is also worth looking into recursive-ascent, which is a parsing algorithm directly encoded after the fashion of recursive-descent, but capable of handling any LALR grammar. I would also dig into parser combinators, which are a functional way of composing recursive-descent parsers together.

Benjamin Hodgson
  • 42,952
  • 15
  • 108
  • 157
Daniel Spiewak
  • 54,515
  • 14
  • 108
  • 120
  • 3
    Very much the response I was hoping for! :) Thanks for all the info there (including last bit, of which I wasn't even aware). It will probably take a bit more reading before I understand all the concepts you've presented in this answer, but you've certainly answered my question and pointed me in the right direction for further study. The main thing I'm fuzzy abuot now is how PEGs relate to recursive descent parsers, and also how exactly a parser combinator combines various parsers. If you could clarify either/both of these, I would be very grateful. – Noldorin Jun 25 '09 at 16:04
  • Also, just to confirm; is "inlining the predict sets" effectively predictive parsing? In this case, what precisely is the "entire class of grammars"? – Noldorin Jun 26 '09 at 00:20
  • A PEG is the formal-ish description of a recursive-descent parser. Since recursive-descent isn't actually LL parsing, a different model was needed. You can kind of think of them like LALR and Bison, one being the formalism, the other being the implementation. – Daniel Spiewak Jun 26 '09 at 17:44
  • 6
    Predict sets: it really depends on the strategy used. If you *exclusively* rely on those predict sets and perform no backtracking, then the class of grammars is precisely LL(k), where k is the amount of lookahead used to compute the predict sets. However, you can get a lot of the benefits of predictive parsing by inlining predict sets in R-D without completely eliminating backtracking. This allows parsers which accept all grammars normally handled by R-D, but with a faster average case. Unfortunately, the worst-case for such parsers is still exponential. – Daniel Spiewak Jun 26 '09 at 17:46
  • 5
    Most recursive-descent parsers (even hand-written ones) will use inlined predict sets to *limit* the alternates, restricting backtracking without constraining flexibility. The end result is a parser which is nearly linear-time for everything but the most pathological grammars, and which still accepts the entire class of PEGs. – Daniel Spiewak Jun 26 '09 at 17:48
  • 5
    Good stuff. One nitpick though: "most nontrivial use cases are implemented in some sort of parser generator...". That's not true. Most widely used compilers and IDEs (C#, VB, Visual C++, and GCC being good examples) use hand written parsers. Those are arguably some of the most non-trivial uses. – Scott Wisniewski Jul 04 '10 at 23:55
  • 1
    Very true (well, almost; GCC uses Bison). Java famously uses a hand-written parser (though Java 7 has a rewritten parser based on ANTLR), as does Scala, Python, Haskell, and many more. However, I think if you did a full roll call, including the likes of C/C++, Ruby, etc, I think you would find that the parser generator crowd wins overall. – Daniel Spiewak Jul 05 '10 at 03:44
  • Can you please give an example of RD parser that is exponential? – Giovanni Funchal Jul 23 '10 at 08:55
  • 5
    @DanielSpiewak I know you posted that comment years ago, but even back then it was wrong ;) GCC used bison for the C-parser untill 3.x, but switched to a hand-written recursive descent parser in 2008ish as far as I can tell from this http://gcc.gnu.org/wiki/New_C_Parser – Morten Jensen Mar 12 '13 at 23:18
  • 1
    I can confirm that LL parsers *are* always linear-time (unless you're *intentionally* trying to make it non-linear-time, of course; slowing things down usually isn't hard!). That's because there is never any backtracking whatsoever (in other words, a `LL(k)` parser always "knows" exactly which alternative to pick from just `k` lookahead symbols). `LL(1)` is the usual case, where you only have one such token. – Tim Čas Feb 11 '15 at 00:24
  • @Daniel Spiewak I don't think Python uses a hand-written parser. But it does use a custom parser generator written by GvR. It was the first code he wrote for Python. (I think Python's lexical syntax is technically not LL(k), due to the need to track arbitrary levels of indentation, but the high-level grammar is. Unsure.) – Jason Orendorff Feb 23 '16 at 16:44
  • 2
    @DanielSpiewak "LL is usually a more efficient parsing technique than recursive-descent." Is that true? When parsing an LL grammar, recursive descent never needs to backtrack, right? (I maintain a recursive descent parser; it only backtracks in places where the grammar is decidedly non-LL.) – Jason Orendorff Feb 23 '16 at 16:54
  • 1
    Agree with @JasonOrendorff. A relative statement like that only makes sense if you compare the algorithms on the same input sets. The fact that recursive descent may be very slow on grammars that are not LL(k) is irrelevant for a direct comparison. – Raphael Apr 24 '17 at 18:29
  • LL is not a parsing technique. It is a grammar class. A recursive-descent parser can be written conforming to an LL(1) grammar. There is a table-driven LL parsing algorithm, which is what you appear tbe discussing. – user207421 Sep 01 '17 at 00:14