29

I used to think C++ was the "weird" one with all the ambiguities with < and >, but after trying to implement a parser I think I found an example which breaks just about every language that uses < and > for generic types:

f(g<h, i>(j));

This could be syntactically either interpreted as a generic method call (g), or it could be interpreted as giving f the results of two comparisons.

How do such languages (especially Java, which I thought was supposed to be LALR(1)-parsable?) get around this syntactic ambiguity?

I just can't imagine any non-hacky/context-free way of dealing with this, and I'm baffled at how any such language can be context-free, let alone LALR(1)-parsable...

(It's worth noting that even a GLR parser can't return a single parse for this statement with no context!!)

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 2
    +1 great question, Just a note, the java compiler has plenty of heuristics to resolve ambiguity, it guesses plenty. They themselves admit that their generic system is broken and there are cases it fails (although they are rate), look at the java 8 features presentation they explain some of the problems and how they'll address them in java 8. – Benjamin Gruenbaum Jan 19 '13 at 08:35
  • @BenjaminGruenbaum: Thanks. :) Regarding heuristics -- they're definitely great for generating good compiler error messages, but unless they're part of the language spec's grammar, they can't be legally used to resolve ambiguities, can they? And if they *are* part of the grammar, then how can the grammar be context-free (let alone LR(1)-parsable)? – user541686 Jan 19 '13 at 08:36
  • 2
    its not amibugous, it depends on what g is declared as – NimChimpsky Jan 19 '13 at 08:43
  • 2
    @NimChimpsky: Yeah, that's the point -- it's **syntactically** ambiguous (since it depends on the context), but not *semantically* ambiguous (since, given the context, it's unambiguous). – user541686 Jan 19 '13 at 08:44
  • Could this be part of why scala went with `[]`? – Karthik T Jan 19 '13 at 08:52
  • @KarthikT: Huh, maybe... I don't know Scala well at all unfortunately but that sounds very plausible. – user541686 Jan 19 '13 at 08:54
  • @NimChimpsky: Also, I just noticed, your name is surprisingly similar to Chomsky! :) – user541686 Jan 19 '13 at 09:04
  • 1
    @Mehrdad lol, its the name of a talking monkey, named after noam. – NimChimpsky Jan 19 '13 at 09:09
  • @Mehrdad. It's definitely why D went with `!()`. http://dlang.org/templates-revisited.html ("D solves it by noticing that ! is not used as a binary operator") – rici Jan 20 '13 at 01:17
  • The point of GLR is to give you *all* the valid *syntactic* parses of your context free language. It would have no trouble parsing your example, and would produce both parses. To use this result, you'd have to decide which of the ambiguous choices made sense, using something other than this particular form (e.g., actual type information about "g" is). That's how we solve with our tools, and it nicely isolates parsing from name and type resolution. As other have noted, Java *was* LALR(1) in a galaxy long, long ago, and far far away. – Ira Baxter Jun 05 '13 at 05:24

2 Answers2

4

a generic method call in java would be <h,i>g(j) so there is no ambiguity :)

ice1000
  • 6,406
  • 4
  • 39
  • 85
Zubzub
  • 782
  • 7
  • 18
  • Wow, you're right... I thought it would be `g(j)` but [apparently it isn't](http://ideone.com/bYyaaj)! +1 Great answer, thanks :) – user541686 Jan 19 '13 at 19:11
  • That was my reaction, too. I think it make it easier for the parsers, and I have to admit it is a really ugly hack. Langauge designers shouldn't listen to the compiler guys complaining when they design; you get stuff like this. – Ira Baxter Jun 05 '13 at 05:27
2

I just can't imagine any non-hacky/context-free way of dealing with this, and I'm baffled at how any such language can be context-free, let alone LALR(1)-parsable...

The answer is that they aren't (at least not Java and C++; I know very little about C#). The Java grammar that you link to dates back to 1996, way before generics have been introduced.

For further discussion, see Are C# and Java Grammars LALR(x)?

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Not even C is fully context-free since the use of a symbol can represent either an identifier or a type name, depending on declarations that can be separated from the use by an arbitrary amount of code. – ebohlman Jan 23 '13 at 18:18