0

GCC/Clang are handwritten parsers. I read a post saying that C++ can't be parsed by an LR(1) parser (Why can't C++ be parsed with a LR(1) parser?). If so, how come GCC/Clang are handwritten recursive descent parsers, when LR(1) is more powerful than recursive descent?

xilpex
  • 3,097
  • 2
  • 14
  • 45
  • How do you define power? – user4581301 May 12 '20 at 02:22
  • @user4581301 -- LR(1) can parse more grammars that recursive descent. – xilpex May 12 '20 at 02:23
  • "LR(1) can parse more grammars that recursive descent" on handwritten parser you can make it parse any grammar as you can change state of the parser based on context. On LR(1) when it is fully specified you cannot do that - you have to formally specify full grammar. – Slava May 12 '20 at 02:28
  • It seems like the link you pointed to answers the question, just not in the top accepted answer. Also, it seems like you're asking two different questions: (1) how does GCC/Clang parse C++ with a hand written parser, and (2) why are GCC/Clang hand written parsers. – ggorlen May 12 '20 at 02:36
  • Does this answer your question? [Why can't C++ be parsed with a LR(1) parser?](https://stackoverflow.com/questions/243383/why-cant-c-be-parsed-with-a-lr1-parser) – ggorlen May 12 '20 at 02:37

1 Answers1

5

GCC/Clang are not strict recursive descent parsers; they allow backtracking (reparsing an arbitrarily-long text), among other deviations from theoretical purity. Backtracking parsers are strictly more powerful than non-backtracking parsers (but at the cost of no longer being linear time).

The real complexities in parsing C++ vanish when the question is phrased this way. If you strip C++ down to the superset described by the BNF in Appendix A, then you basically just need to be prepared to consider several alternative parses. You can do that through backtracking -- trying one possibility and then if that fails, trying some other ones -- or with parallel exploration, using GLR/GLL or some other variant; nothing too painful. But the real work still needs to be confronted:

  • the preprocessor, which is not particularly complicated but cannot be described by anything remotely resembling a formal parsing framework;

  • template instantiation, which intermingles semantic analysis into the parsing process (and needs to be done in order to discover the correct parse);

  • name resolution, which some might not consider to be part of parsing, but you're not going to move on to the next step until you know which syntactic object a particular identifier refers to. (If you think name resolution is simple, reread the 13 dense pages of the C++ standard in section 6.5 (Name lookup), and then move on to the 35 pages on resolving overloaded names, section 12.)

rici
  • 234,347
  • 28
  • 237
  • 341