6

I'm interested in the famous The syntax of C in Backus-Naur Form and studied for a while, what confuse me is that some syntax looks wrong to me but is considered right according to the BNF.

For example, int test {}, what's this? I think this is a ill syntax in C, but the truth is the BNF considered this a function definition:

int -> type_const -> type_spec -> decl_specs
test-> id -> direct_declarator -> declarator
'{' '}' -> compound_stat
decl_specs declarator compound_stat -> function_definition

I tried this with bison, it considered the input int test {} is a right form, but I tried this on a C compiler, it will not compile.

So got questions:

  1. int test {} a right syntax or not?
  2. If it is a right syntax, what is that mean and why compiler do not recognized it?
  3. If it is an ill syntax, can I say the BNF is not rigorous? And does that mean modern C compiler does not stick with this BNF?

2 Answers2

3

The grammar is necessary but not sufficient to describe a valid C program. For that you need constraints from the standard too. A simpler example of this would be 0++, which follows the syntax of a C expression, but certainly isn't a valid program fragment...

C11 6.9.1p2:

  1. The identifier declared in a function definition (which is the name of the function) shall have a function type, as specified by the declarator portion of the function definition. [162]

The footnote 162 explains that the intent of the constraint is that a typedef cannot be used, i.e. that

typedef int F(void);
F f { /* ... */ }

will not be valid, even though such a typedef could be used for a function declaration, i.e.

F f;

would declare the function

int f(void);

But mere existence of this constraint also proves that the BNF grammar in itself is not sufficient in this case. Hence you are correct in that the grammar would consider such a fragment a function definition.

  • Can I understand that this in this BNF, `int test {}` is syntax valid, but semantics wrong? –  Mar 22 '19 at 08:57
  • Thanks for answering. I'm writing a parser for C language, do you happen to know if there's a more 'accurate' BNF? A BNF which follows the constraints as much as possible, to give `int test {}` a wrong syntax error? –  Mar 22 '19 at 09:02
  • @reavenisadesk No, I doubt that is ever possible... At least not in BNF. You just need to go through the parsed tree and check the constraints otherwise. – Antti Haapala -- Слава Україні Mar 22 '19 at 09:03
  • You can consider the *English* language, if you'd write a BNF for all the *semantically valid sentences* it would be quite a bit more complicated than the BNF of *syntax*... – Antti Haapala -- Слава Україні Mar 22 '19 at 09:04
  • You're right, I guess the `test-> id -> direct_declarator -> identifier` path just isn't possible after all. Unless there's another exotic way of defining a function. – Blaze Mar 22 '19 at 09:06
  • @AnttiHaapala I understand, I just got a feeling this is not a semantically problem but a syntax problem... Just like as if "Noun Noun Verb" is allow in English but you can't say "sheep grass eat"... –  Mar 22 '19 at 09:15
  • 1
    But you can say "[Snow sheep](https://en.wikipedia.org/wiki/Snow_sheep) eat." and it *is* syntactically *and* semantically correct to a letter! – Antti Haapala -- Слава Україні Mar 22 '19 at 09:39
  • 1
    @antti: but in this case, the constraint really is syntactic. And it would have been possible to write it out in BNF, but they chose not to because the resulting bloat would have made the grammar even harder to read. The tagged-non-terminal formalism used by Ecmascript (which is "just" syntactic sugar over BNF) makes this sort of constraint much easier to write (and read). – rici Mar 22 '19 at 14:48
  • In C (but not C++), `lvalue` can also be defined syntactically. In theory. But practically no-one would actually want to write and validate a grammar which did that. – rici Mar 22 '19 at 14:51
  • @rici and not only an lvalue but *modifiable* lvalue - `foo++` does not make sense for `int foo[]`... – Antti Haapala -- Слава Україні Mar 23 '19 at 08:12
  • @antti: true, you cannot reject non-modifiable lvalues syntactically. Like you can't reject redefinitions. But you could specify that a function definition has to be based on a function declaration, if you wanted to. – rici Mar 23 '19 at 12:00
1

The BNF form is a precise way to describe the syntax of a language, i.e. what to do precisely to get the parse tree starting from raw input.

For each language you can define infinite many grammars that describe that language. The properties of these grammars that describe the same language can differ a lot.

If you study the grammar of the C language, take care as it is not context free but context sensitive, which means, the decision of choosing a rule or other depends on what there is around that point in input.

Read about the lexer hack to see how to correctly interpret the Backus Naur form of the C grammar.

alinsoar
  • 15,386
  • 4
  • 57
  • 74