What do people mean when they say C++ has "undecidable grammar"?

Question

What do people mean when they say this? What are the implications for programmers and compilers?

I suspect most people here don't know what "undecidable" means in computer science. Check out the Wikipedia article: http://en.wikipedia.org/wiki/Undecidable_problem — Jay Conrod, Apr 27 '09 at 15:47
Most stuff I can google for about C++ having "undecidable" grammar merely states that a statement's meaning depends on previous definitions, i.e. on its context. Wow, how undecidable! This is like saying that a basketball play is "undecidably" good or bad because it depends on who's ahead and what the remaining time is. — Daniel Daranas, Apr 27 '09 at 16:01
@Daniel, That's not the formal meaning of undecidable. If the compiler always says valid/not valid for a program, then the language (or the subset the compiler actually works on) is decidable. If the compiler could potentially churn away forever and not terminate (or at least until it runs out of memory), then the language is not decidable. — Jay Conrod, Apr 27 '09 at 16:10
@Jay - thanks, I was talking about "most stuff I can google for", not about the formal definition :) I appreciate your answer below, it's very clarifying. — Daniel Daranas, Apr 27 '09 at 16:12

score 70 · Accepted Answer · edited May 23 '17 at 12:32

70

This is related to the fact that C++'s template system is Turing complete. This means (theoretically) that you can compute anything at compile time with templates that you could using any other Turing complete language or system.

This has the side effect that some apparently valid C++ programs cannot be compiled; the compiler will never be able to decide whether the program is valid or not. If the compiler could decide the validity of all programs, it would be able to solve the Halting problem.

~~Note this has nothing to do with the ambiguity of the C++ grammar.~~

Edit: Josh Haberman pointed out in the comments below and in a blog post with an excellent example that constructing a parse tree for C++ actually is undecideable. Due to the ambiguity of the grammar, it's impossible to separate syntax analysis from semantic analysis, and since semantic analysis is undecideable, so is syntax analysis.

See also (links from Josh's post):

edited May 23 '17 at 12:32

Community

1
1

answered Apr 27 '09 at 15:44

Jay Conrod

28,943
19
98
110

5

funny, the turing complete template system in c++ is what I consider one of its greatest strengths. – Evan Teran Apr 27 '09 at 16:11
5

This was news to me, but I totally saw it at the first sentence -- brilliant! @Evan -- yeah but being undecidable is not necessarily a "defect" -- it's just the way it is; just like axiomatic logic is not "defective" only because it is incomplete (Gödel). – Euro Micelli Apr 27 '09 at 18:11
7

@Evan, it's be a strength for C++ programmers in that you can compute things at compile time. However, it makes it more difficult to write a good C++ compiler. – Jay Conrod Apr 27 '09 at 19:29
6

This is not true. An implementation is allowed to reject a program because template instantiation/recursion exceeds some arbitrary depth. The C++11 standard recommends allowing at least 1024 nested template instantiations, but that isn't actually a requirement. Thus all template metaprograms halt in O(1) time. Similarly C and C++ have preprocessor metaprograms, but the (recursive) nesting level of `#include` is allowed to be limited. – Potatoswatter Dec 16 '11 at 04:16
4

@Potatoswatter: it's more useful to consider C++ templates to be Turing-complete (unless your program hits the instantiation depth, which you can usually even increase), just as almost everybody does with computers (which do not have unlimited memory, hence are not Turing-complete). The compiler is unable to distinguish a genuinely looping program from one which would have needed just one more template instantiation. – Blaisorblade Jan 19 '12 at 10:47
5

@Blaisorblade: Useful yes, but in terms of formal definitions, it makes a big difference that the standard specifies an implementation-defined limit. However (and I should have remembered this in December), there is at least one instance of template recursion *without* template nesting, in the drill-down behavior of `operator->`. Endless recursion within this mechanism crashes possibly all C++ compilers except GCC, which I fixed last June by adjusting it to be considered (incorrectly in Standard terms) as nesting. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49118 – Potatoswatter Jan 19 '12 at 10:59
3

This answer is incorrect. It does have to do with the ambiguity of the C++ grammar. The problem is that type/variable-name disambiguation that is traditionally performed by the "lexer hack" in plain C can require arbitrary template instantiation to resolve in C++. Since template instantiation is Turing-complete, simply producing a *parse tree* in C++ is undecidable in general, unless you limit template instantiation depth. – Josh Haberman Aug 20 '13 at 05:55
you're famous: http://blog.reverberate.org/2013/08/parsing-c-is-literally-undecidable.html – vartec Aug 29 '13 at 15:53
1

Thanks for the update Jay. I just noticed one little thing in your updated answer: you imply that "semantic analysis" is a monolithic thing, and that you have to do all of it to parse C++. But I think the only thing required to parse C++ is partial instantiation of class templates. I *think* (but can't prove) that many other parts of semantic analysis (overload resolution, implicit conversion, arithmetic conversions) aren't actually necessary to disambiguate the parse tree. It's possible I'm missing something, but "semantic analysis" seems like slightly too broad a brush. – Josh Haberman Aug 31 '13 at 00:33
1

Hmm, this might be too pedantic of me. Everyone I talk to about this just calls the whole package "semantic analysis." Oh well. – Josh Haberman Aug 31 '13 at 00:58

score 13 · Answer 2 · answered Apr 27 '09 at 15:35

What it probably means is that C++ grammar is syntactically ambiguous, that you can write down some code that could mean different things, depending on context. (The grammar is a description of the syntax of a language. It's what determines that a + b is an addition operation, involving variables a and b.)

For example, foo bar(int(x));, as written, could be a declaration of a variable called bar, of type foo, with int(x) being an initializer. It could also be a declaration of a function called bar, taking an int, and returning a foo. This is defined within the language, but not as a part of the grammar.

The grammar of a programming language is important. First, it's a way to understand the language, and, second, it's part of compiling that can be made fast. Therefore, C++ compilers are harder to write and slower to use than if C++ had an unambiguous grammar. Also, it's easier to make certain classes of bugs, although a good compiler will provide enough clues.

If you make it 'foo * bar(int(x))' then it can be: (a) an expression, (b) an object declaration or (c) a function declaration. — Richard Corden, Apr 28 '09 at 09:33

score 10 · Answer 3 · edited Dec 30 '11 at 04:19

10

If "some people" includes Yossi Kreinin, then based on what he writes here ...

Consider this example:

x * y(z);

in two different contexts:

int main() {
    int x, y(int), z;
    x * y(z);
}

and

int main() {
    struct x { x(int) {} } *z;
    x * y(z);
}

... he means "You cannot decide by looking at x * y(z) whether it is an expression or a declaration." In the first case, it means "call function y with argument z, then invoke operator*(int, int) with x and the return value of the function call, and finally discard the result." In the second case, it means "y is a pointer to a struct x, initialized to point to the same (garbage & time-bomb) address as does z."

Say you had a fit of COBOLmania and added DECLARE to the language. Then the second would become

int main() {
    DECLARE struct x { x(int) {} } *z;
    DECLARE x * y(z);
}

and the decidability would appear. Note that being decidable does not make the pointer-to-garbage problem go away.

edited Dec 30 '11 at 04:19

Ry-

218,210
55
464
476

answered Apr 27 '09 at 16:00

Thomas L Holaday

13,614
6
40
51

9

Of course, this is not limited to C++ - consider BASIC, for example - is "x = 1" an assignment or a test? Only in context can you tell. – Apr 27 '09 at 16:04
1

Yossi Kreinin does talk about "the problem making the C++ grammar undecidable", but that's bullshit; in fact the same website elsewhere, while explaining that C++ has undecidable grammar, says that "This shows (on an intuitive level) that the C++ grammar is quite context-sensitive." http://yosefk.com/c++fqa/defective.html#defect-2 – Blaisorblade Jan 19 '12 at 10:53
5

@anon: in BASIC it's enough to know against which production you are matching "y=1", i.e. if it is an expression or a statement. C++ is more complex, since the same character sequence, at the same position and in the same immediate context can mean totally different things. – Blaisorblade Jan 19 '12 at 10:56

score 3 · Answer 4 · answered Apr 27 '09 at 16:18

3

'Undecidable grammar' is a very poor choice of words. A truly undecidable grammar is such that there exists no parser for the grammar that will terminate on all possible inputs. What they likely mean is that C++ grammar is not context-free, but even this is somewhat a matter of taste: Where to draw the line between syntax and semantics? Any compiler will admit only a proper subset of those programs that pass the parser stage without syntax errors and only a proper subset of those programs actually run without errors, thus no language is truly context-free or even decidable (barring perhaps some esoteric languages).

answered Apr 27 '09 at 16:18

TrayMan

7,180
3
24
33

3

There's a choice of terms, but it doesn't matter here whether the program terminates. In formal languages, a language is decidable if an algorithm can decide whether a word belongs to the language. In plain terms, if a program compiles it belongs to the language, no matter the run-time results. Furthermore, simply-typed lambda calculus is a quite simple example of a language where every program terminates, and many other more complex variations exist. – Blaisorblade Jan 19 '12 at 11:03
This is an old question & answer and I don't care to edit it, but perhaps I worded it poorly. The C++ template system is definitely Turing complete, hence it's undecidable whether a piece of code (or a 'word' in the technical jargon) is a valid C++ program or not. But does one consider the full C++ specification the 'C++ grammar'? Or can a C++ program be 'grammatically correct' but still fail to compile? The mention of context-free grammars in my answer is a bit misleading though. – TrayMan Jun 22 '22 at 09:52

score 0 · Answer 5 · answered Apr 27 '09 at 15:37

The implication for those of us using the language is that the error messages can get very weird, very fast (in practice this isn't such a big deal. STL library errors are usually worse than the stuff you end up with due to the language grammar).

The implication for those who write the compilers is that they have to spend a lot of extra time and effort compiling the language.

What do people mean when they say C++ has "undecidable grammar"?

5 Answers5

Linked