What kind of parser is most frequently used in the real world?

Question

I just learned the theory behind LL(1) parsers and subsequently I learned that this parsing technique is only suitable for a subset of context-free-grammar. LL(1) is predictive and doesn't do any backtracking. I don't know if the real-world needs backtracking capabilities. Does the real-world need backtracking capabilities? LL(1) seems like a fair introduction to parsing, but like nothing anyone would use to "get a job done". Example? You have to left-factor your grammar and then LL(1) isn't even suited to parse a formatting language like the one wikipedia uses, where some rules use multiple reduplication as for instance headings. (correct me if I'm wrong!)

But what parsing-technique is used in the real world if you want to get things done?

Probably the answer involves some kind of compromise. Speed is not so much an issue as of yet.

With "real world" abilities I mean that the parser will parse the whole set of context-free-grammars, if possible without any special-snowflake treatment like left-factoring. I further think of fictitious situations where I should somehow wind up writing back-end stuff for a game project, if the task then would haphazardly be to give a scripting-language to the designers, what parser would meet the requirements without on the other hand being too... "esoteric…" and "complex" - or for short: being an overkill.

I hope that suffices to give you an idea what I mean.

Thank you very much!

What parser you use is determined by what you want to parse! Different techniques are suited to/support different language classes (and different requirements for handling ambiguities). For general, context-free grammars (such as are most programming languages) LALR and GLR parsing are very common. I (and a small but growing number of other compiler designers) am of the opinion that chart parsers (for example Aycock/Horspool optimizations to Earley's parser) are best-of-all-worlds with modern hardware and should be used almost everywhere these days in new implementations. — BadZen, Dec 06 '16 at 23:10
(In particular if you want to support the full class of CF grammars with a minimum of fuss, check out http://staff.icar.cnr.it/ruffolo/progetti/projects/10.Parsing%20Earley/2002-Practical%20Earley%20Parsing-10.1.1.12.4254.pdf or implementations thereof. It's also *way easier to implement* than most other types of parser.) — BadZen, Dec 06 '16 at 23:13
LALR parsers are typically just as fast (if not faster) than LL parsers, and are powerful enough to handle most languages (although I take @BadZen's point about GLR parsing). The only valid excuse for top-down parsing is that it is slightly less difficult to produce good diagnostic messages; it has nothing to do with speed or esotericism. If it were me teaching parsing techniques, I'd skip over the LL stuff and dive straight into practical parsing. — rici, Dec 06 '16 at 23:14
@rici - Good point re: errors as quality attribute / feature. Let's also add incremental parsing (such as an IDE needs to do when a single token/character is added or deleted) to that list! (For that latter thing - that chart parsing I'm advocating... is not so hot.) — BadZen, Dec 06 '16 at 23:17
Hey guys, thanks for your answers! En passant you even specified my question with great expressions and keywords: - practical parsing - „support the full class of CF grammars with a minimum of fuss...” I will certainly look into Earley's method. Do you mind if I bother you with a few other questions? So LR is already well suited to parse the whole class of CFGs? Yet, LALR is still more powerful and the most powerful LR species to begin with? What I read about LALR so far is that it only makes the parsing-table smaller / does a faster parse. No other advantages. — von spotz, Dec 07 '16 at 09:17
Regarding my question on the class of LR parsers my questions have (so far) been answered in this thread [link](http://stackoverflow.com/questions/2676144/what-is-the-difference-between-lr-slr-and-lalr-parsers) — von spotz, Dec 07 '16 at 10:11
Please allow me another question (at this point). Do more powerful parsers necessarily require backtracking? And: Is it (made) known, what kind of parsing technique stackoverflow uses for its comment formatting ? — von spotz, Dec 07 '16 at 10:47
"Backtracking" is a method a specific type of parser may implement - not a parser or grammar class. Descent or predictive parsers, when faced with "equally good" choices (this happens if the grammar is not LL[k]), may choose wrong - ie. in a way that leads to a non-matching parse tree / failed parse. They "backtrack" to the point they had to make a choice, in that case, to choose differently. Other types of parser don't work that way, so the concept of "backtracking" doesn't make any sense. In this sense the answer to your question is "no". — BadZen, Dec 07 '16 at 21:36
(As an example, the Earley parser algorithm I linked the paper to above does not "backtrack" because it never chooses - it effectively keeps track of all possible choice trees simultaneously... yet it will parse any CF grammar.) — BadZen, Dec 07 '16 at 21:37
(SO is closed-source, so we don't know what Markdown parser they use here. By the way, in general, additional or follow-on questions should go in separate posts, where they are relevant...) — BadZen, Dec 07 '16 at 21:41

What kind of parser is most frequently used in the real world?

0 Answers0