3

Most parsing can be done by looking only at the next symbol (character for lexical analysis, token for parsing proper), and most of the remaining cases can be handled by looking at only one symbol after that.

Are there any practical cases - for programming languages or data formats in actual use - that need several or indefinitely many symbols of lookahead (or equivalently backtracking)?

rwallace
  • 31,405
  • 40
  • 123
  • 242

2 Answers2

3

As I recall, Fortran is one language in which you need a big lookahead buffer. Parsing Fortran requires (in theory) unbounded lookahead, although most implementations limit the length of a statement line, which puts a limit on the size of the lookahead buffer.

Also, see the selected answer for Why can't C++ be parsed with a LR(1) parser?. In particular, the quote:

"C++ grammar is ambiguous, context-dependent and potentially requires infinite lookahead to resolve some ambiguities".

Community
  • 1
  • 1
Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • Is that quote correct? I thought it was well-proven that C++ can be parsed by a LALR(1) parser, which is obviously a parser with bounded lookahead… – Nowhere man Mar 17 '12 at 00:13
  • @Nowhereman: Read the selected answer to the linked question, including the referenced PhD thesis and the second comment that gives a particularly good example. – Jim Mischel Mar 19 '12 at 16:45
2

Knuth proved that any LR(k) grammar can be mechanically transformed to a LR(1) one. Also, AFAIK, any LR(k) grammar can be parsed in time proportional to the length of the parsed string. As it includes LR(1), I don't see what use there would be for implementing LR(k) parsing with k > 1.

I never studied the LR(k)->LR(1) transformation, so it may be possible that it is not that practical for some cases, though.

Nowhere man
  • 5,007
  • 3
  • 28
  • 44