9

Here is a simple function with one int parameter:

void f(int x) {}

f(42);

And here is another function with one int parameter:

void g(int(x)) {}

g(42);

Now let us define x to be a type:

typedef int x;
void h(int(x)) {}

h(42);
// warning: passing argument 1 of ‘h’ makes pointer from integer without a cast

(This is the behavior I observe with gcc 4.8.2)

How do parser writers deal with this situation?

It seems the classic pipeline Lexer -> Parser -> Semantic Checker -> ... does not work here.

fredoverflow
  • 256,549
  • 94
  • 388
  • 662
  • 5
    Oh Fred, you're into the most vexing parse again. Yep, it's the same mechanism. And you're right, there is knowledge flowing back in the pipeline. – Cheers and hth. - Alf Feb 28 '15 at 19:35
  • Someone removed the C++ tag. I put it back. Do you want this to be C or C++ @FredOverflow. – Cheers and hth. - Alf Feb 28 '15 at 19:42
  • 1
    The lexer sends a symbol x -> the parser knowing the meaning of the sybols finds the potential gramatical rules (and in doubt, applies the most vexing **parse**) -> the semantic checker looks for potential type conversions. I don't know what should not work in the classic pippeline...? – Christophe Feb 28 '15 at 19:44
  • Is your question more about what kind of token the lexical analyzer interprets `x` as or about the grammar rules that interpret this as a function pointer rather than another parse? – jschultz410 Feb 28 '15 at 20:02
  • @jschultz410 My question is, as I wrote: `How do parser writers deal with this situation?` There must be some standard idioms, patterns, whatever, because I bet there are dozens of these exceptions to deal with in real world parsers. – fredoverflow Feb 28 '15 at 20:31
  • Perhaps you could clarify the question by explaining how a naive parser would get this wrong? (It may be obvious to anyone with a shot at actually figuring out the answer, but making it clearer to the rest of us wouldn't hurt. At a minimum you might get fewer answers that miss your point.) – Harry Johnston Mar 01 '15 at 01:25
  • @FredOverflow Not sure if you know all this already or not, but here's a lexical analyzer and grammar for C11: http://www.quut.com/c/ANSI-C-grammar-l-2011.html Before the typedef, `x` will be interpreted as an `IDENTIFIER` while after, it will be interpreted as a `TYPEDEF_NAME` because the analyzer is being informed through the symbol table that `x` is now a type. In this particular case, the parsing is pretty straightforward then. The feedback occurs through the symbol table in this case. – jschultz410 Mar 01 '15 at 04:37
  • @jschultz410 But wouldn't that make `void foo() { typedef int x; { int x; } }` illegal, because `int x;` would now be parsed as `INT TYPEDEF_NAME`? – fredoverflow Mar 01 '15 at 08:51
  • @FredOverflow It does make typedef int x; int x; at the same scope illegal, for example. I believe your example is legal because the symbol table changes the type of symbol `x` is depending on its scope. Again, that is feedback occurring from the semantic level down to the lexer through the symbol table. – jschultz410 Mar 01 '15 at 14:57
  • Actually, I'm not so sure now because typedef int x; int main() { x x = 5; return x; } looks legal too. – jschultz410 Mar 01 '15 at 14:59
  • Unless there is a grammar rule that matches IDENTIFIER IDENTIFIER ; that I'm not able to see ... – jschultz410 Mar 01 '15 at 15:15
  • @jschultz410 In the meantime, I discovered the article [How Clang handles the type / variable name ambiguity of C/C++](http://eli.thegreenplace.net/2012/07/05/how-clang-handles-the-type-variable-name-ambiguity-of-cc/) and found it to be very helpful. TL;DR: The lexer does not need to be hacked with artificial type identifiers. All ambiguities can be resolved inside the parser. – fredoverflow Mar 01 '15 at 15:24
  • @FredOverflow That's a great series of articles on this topic. The first couple of articles http://eli.thegreenplace.net/2007/11/24/the-context-sensitivity-of-cs-grammar/ confirmed that most lex-yacc approaches use feedback from the higher levels to try to distinguish typedef-name's from regular identifiers in the lexer. – jschultz410 Mar 01 '15 at 19:35
  • Answer is given here: http://stackoverflow.com/q/22977043/103167 Most importantly, "Syntactic and semantic analysis are performed together, inseparably. Semantic errors are caught during the syntactic phase of the parse, phase 7." – Ben Voigt Mar 02 '15 at 02:10

2 Answers2

12

You've effectively defined h as:

void h(int(int)) {}

The parameter is interpreted as an unnamed function pointer that takes an int and returns an int. When you try to pass 42 to it, the compiler complains that you are trying to make a function pointer from an integer.

I think what you are asking for is how do compilers handle (unnamed) function pointer types and their possibly ambiguous parses. Your question is related to the the most vexing parse in C++.

There they decided that whenever there was ambiguity between a function pointer type and another way to parse, then it would be interpreted as a function pointer. They did that because there are other ways to disambiguate when you don't want it to be a function pointer (e.g. - surround it in parentheses, use {} initializer syntax, etc.).

Getting into the specifics of how a parser writer might deal with this parse, here's a lexical analyzer and grammar for C11: http://quut.com/c/ANSI-C-grammar-l-2011.html In your example, before the typedef, x will be an IDENTIFIER token while after, it will be a TYPEDEF_NAME token because the analyzer is being informed through the symbol table that x is now a type. In this particular case, the parsing is unambiguous then. The "pipeline feedback" that you seem to be referring to occurs through the symbol table in this case, where the lexical analyzer is informed about context by the higher levels that affects its output as the compilation progresses.

EDIT: These three articles, found by the OP, describe this problem and how it is solved by some C parsers / compilers very nicely. Basically, a context free grammar (CFG) that only accepts / generates legal C syntax can almost be specified. With the introduction of a scoped lookup table that allows the lexical analyzer to distinguish between identifiers and typedef-names appropriately, then a CFG [and more importantly a LALR(1) parser (e.g. - yacc generated)] that only accepts / generates legal C syntax can be specified.

Here's an even scarier example than the OP's:

typedef int x;

int main() { x x = 5; return x; }  /* crazily enough this is legal C syntax and a well formed C program */
jschultz410
  • 2,849
  • 14
  • 22
6

After introducing typedef

typedef int x;

the function has the following definition

void h(int( int ) ) {}

that is its parameter is declared as having type of function int( int ) that is adjusted to pointer to function.

You call the function supplying an integer:

h(42);

There is no implicit conversion from integer to function pointer.

I do not see a problem with

It seems the classic pipeline Lexer -> Parser -> Semantic Checker -> ... does not work here.

The parameter is substituted for the typedef.
x has a compiler attribute of a type. So it considers the record like

type-specifier h(type-specifier( type-name ) ) {}
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • 3
    I think Fred is asking about the use of non-syntactical information in the parse, i.e. how knowledge of the nature of `x` influences the parsing. – Cheers and hth. - Alf Feb 28 '15 at 19:37
  • 1
    Thanks Vlad. I think I disagree with your reasoning about substitution, though. At the least I think, if it is correct, then a standards quote would be nice. – Cheers and hth. - Alf Feb 28 '15 at 19:45
  • @Cheers and hth. - Alf The compiler simply builds a table of names. And each name in the table has its attributes that describe the name. – Vlad from Moscow Feb 28 '15 at 19:47
  • Thanks, and yes I have implemented a compiler (as a student, I think that was in 1984), and quite a few interpreters, so I'm familiar with how they work internally. Although it's been so long that if you throw some grammar names at me I would have to google it. :) – Cheers and hth. - Alf Feb 28 '15 at 19:50
  • 1
    Vlad, you didn't answer the question. @Cheersandhth.-Alf, what on earth do quotes from the standard have to do with it?? – Lightness Races in Orbit Feb 28 '15 at 19:54
  • @Cheers and hth. - Alf Where to find time that to read all that exists in programming? – Vlad from Moscow Feb 28 '15 at 19:55
  • @LightnessRacesinOrbit: The precise mechanism involved at the formal level, constraining what guaranteed behavior one can rely one, is sometimes quite different from the practical details of how an actual compiler deals with the issue. And I think Fred is asking about the formal level. I.e., language lawyer stuff, how, in the formal, information flows back from semantic analysis to syntactical parsing. – Cheers and hth. - Alf Feb 28 '15 at 20:00
  • 2
    @Cheers: And the standard discusses precisely zero of those things. At least, it doesn't discuss any further than what the phases of translation are, but Fred already knows those. What he's asking for is the details at the formal level of _implementations_, not of the C++ abstract machine. – Lightness Races in Orbit Feb 28 '15 at 20:01