8

Here it is syntactically impossible to tell whether f/g are function calls or typecasts without knowing how they are declared. Do compilers know the difference in the parse step, or do they usually resolve this in a second pass?

void f(int x){};
typedef short g;

int main(void){
   ((f)(1));
   ((g)(1));
   return 0;
}
Andrew Johnson
  • 3,078
  • 1
  • 18
  • 24
  • The compiler has the information of the type or variable Before evaluating the expression. – BLUEPIXY Jul 21 '14 at 23:21
  • 5
    I going to go out on a limb and tell you that someone is going to comment on void main(). (I guess I just did.) But it is an interesting question. (Where is Kieth Thompson?) – ryyker Jul 21 '14 at 23:22
  • 1
    @BLUEPIXY we're talking about parsing, which is a long time before evaluation happens – M.M Jul 21 '14 at 23:22
  • @MattMcNabb They are declared before they can be used first in C – BLUEPIXY Jul 21 '14 at 23:24
  • 2
    [See here](http://en.wikipedia.org/wiki/The_lexer_hack) for explanation. Summary: the lexer has to be able to query the semantic analysis that has been performed so far, in order to know whether the name is a typename or not. – M.M Jul 21 '14 at 23:29
  • This is actually mentioned in *C: A Reference Manual (5th Edition)*. The lexer must have a sneak pick at the semantic analysis results. – Filipe Gonçalves Jul 21 '14 at 23:30
  • @ryyker the semicolon at the end of the first line is illegal in C also (the grammar considers it a statement rather than a declaration) – M.M Jul 21 '14 at 23:34
  • 1
    @ryyker: Here I am! I forgive you for misspelling my name. – Keith Thompson Jul 22 '14 at 00:02
  • I'm very tempted to change the incorrect `void main()` to `int main(void)`, but editing other people's code is often frowned upon. Please fix it yourself. See questions 11.12a and 11.12b in the [comp.lang.c FAQ](http://www.c-faq.com/). – Keith Thompson Jul 22 '14 at 00:14
  • not just `void main(void)`, `void main(an, unspecified, number, of, arguments)`, but I digress when the compiler see's `f` or `g` it just sees an identifier. It then has to go look and see what (or if) it is defined as. <<=ending in preposition - the English equivalent of `void main()`. – technosaurus Jul 22 '14 at 00:26
  • @KeithThompson - Lol, I knew when I first read this question, that there was likely some mysterious and irresistible draw already pulling you from wherever you were. I continue to be entertained by the debate. I confess, I am tempted to ask a version of this question simply to give you (et. al.) opportunity to answer the question _why there seems to be a level of fervor (by many) that does not seem to be supported by the standard?_ ( ***[ref. this question](http://stackoverflow.com/a/9356660/645128)***, which you participated in ). – ryyker Jul 22 '14 at 14:36
  • @KeithThompson - And sorry for misspelling your name :) – ryyker Jul 22 '14 at 14:54
  • @MattMcNabb - you probably meant your comment to me to go to BLUEPIXY? I had not previously engaged in that sub-discussion. – ryyker Jul 22 '14 at 15:00
  • Just realized that type casting and parentheses have different operator precedence. The plot thickens... – Andrew Johnson Oct 03 '14 at 23:46

2 Answers2

6

Very early versions of C (before the first edition of K&R was published in 1978) did not have the typedef feature. In that version of C, a type name could always be recognized syntactically. int, float, char, struct, and so forth are keywords; other elements of a type name are punctuation symbols such as * and []. (Parsers can distinguish between keywords and identifiers that are not keywords, since there are only a small and fixed number of them.)

When typedef was added, it had to be shoehorned into the existing language. A typedef creates a new name for an existing type. That name is a single identifier -- which is not syntactically different from any other ordinary identifier.

A C compiler must maintain a symbol table as it parses its input. When it encounters an identifier, it needs to consult the symbol table to determine whether that it's a type name. Without that information, the grammar is ambiguous.

In a sense, a typedef declaration can be thought of as creating a new temporary keyword. But they're keywords that can be hidden by new declarations in inner scopes.

For example:

{
    typedef short g;
    /* g is now a type name, and the parser has
     * to treat it almost like a keyword
     */
    {
        int g;
        /* now g is an ordinary identifier as far as the parser is concerned */
    }
    /* And now g is a type name again */
}

Parsing C is hard.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • "Parsing C is hard." no kidding. This is just the tip of the iceberg right? – Andrew Johnson Jul 22 '14 at 00:15
  • @AndrewJohnson: Actually, I think that's most of the iceberg. Apart from the issue with type names, the C grammar is regular enough to be parsed using standard techniques (as far as I know). – Keith Thompson Jul 22 '14 at 00:16
  • Between `int *f(), g;` and `int *f, g();` which are pointers? – Andrew Johnson Jul 22 '14 at 00:32
  • 1
    parsing(c,is,hard,but); it::could( though ); – technosaurus Jul 22 '14 at 00:37
  • @AndrewJohnson: In the first, `f` is a function returning pointer to `int` and `g` is an `int`. In the second, `f` is a pointer to `int` and `g` is a function returning `int`. This can be confusing for human readers, but it's straightforward for a parser given the language grammar. – Keith Thompson Jul 22 '14 at 00:41
4

I think they do it lazily: whenever a token is parsed, the parsing of the next token is delayed until that symbol's semantic information is known. Then when the next token is parsed, the compiler already knows whether the symbol being referred to is a type name or not (it must have been declared earlier), and can act accordingly.
(So in this approach the semantic and syntactic analyses are intertwined and cannot be separated.)

user541686
  • 205,094
  • 128
  • 528
  • 886