JVM spec: grammar for generic signatures requires lookahead

Question

In the spec for generic signatures, ClassSignature has the form

ClassSignature:
  [TypeParameters] SuperclassSignature {SuperinterfaceSignature}
TypeParameters:
  < TypeParameter {TypeParameter} >
TypeParameter:
  Identifier ClassBound {InterfaceBound}
ClassBound:
  : [ReferenceTypeSignature]
InterfaceBound:
  : ReferenceTypeSignature

So the superclass bound of a type parameter can be omitted (some examples here).

If I have a class declaration public class A<T, LFooBar>, the Java compiler generates the signature <T:Ljava/lang/Object;LFooBar:Ljava/lang/Object;>Ljava/lang/Object;.

IUC, the class bound could be omitted, in which case the signature would be <T:LFooBar:>Ljava/lang/Object;.

Parsing that short signature requires looking ahead to the second : in order to know that T:LFooBar: are two type parameters, and not one type parameter T with class bound FooBar.

Maybe in practice, leaving away the class bound is only done if there's an interface bound? For public class A<T extends Comparable<? super T>>, javac produces the signature <T::Ljava/lang/Comparable<-TT;>;>Ljava/lang/Object;. But I guess i cannot rely on this assumption.

Did I misunderstand something?

Antimony · Answer 1 · 2017-06-01T00:39:42.163

0

If you look closely, the only thing that can follow an omitted ReferenceTypeSignature is an Identifier or a >. Since ReferenceTypeSignature must either begin with an [ or end with a ; and identifiers can't contain these characters, while identifiers must be followed by a :, which can't appear type signatures, there is no ambiguity between those options.

Note that identifiers can start with >, so you need to look ahead for a colon to determine whether you are at the end TypeParameters or not. But that's a separate issue.

I'm not sure how the JVM implements it, but one possible approach is this:

Examine the first character. If it is [, you have a type signature. If it is >, scan ahead for the first [, ;, or :. If the first one you see is :, that means you have identifier, otherwise you have end of type parameters.
Otherwise, scan ahead for the first ; or :. If it is :, you have a identifier, otherwise you have a class bound.

Edit: Identifiers in signatures cannot contain >, so ignore that bit. (They also can't contain :, another potential source of ambiguity)

edited Jun 01 '17 at 00:39

answered May 31 '17 at 14:36

Antimony

37,781
10
100
107

Thanks, so scanning ahead is what I'm doing, and it seems to be necessary. You mention "Note that identifiers can start with >" - can you provide a reference? The spec says "The grammar includes the terminal symbol Identifier [...] Such a name must not contain any of the ASCII characters . ; [ / < > :" (https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7.9.1) – Lukas Rytz May 31 '17 at 15:08
1

The JVM doesn’t implement it at all, as the JVM doesn’t care for Generics. Within [the Reflection code](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/sun/reflect/generics/parser/SignatureParser.java#SignatureParser.parseBounds%28%29) there’s no look-ahead at all. So the assumption that “*leaving away the class bound is only done if there's an interface bound*” might be true, but it’s still a mismatch between the specification and the implementation and we don’t know, in which direction the fix will go. – Holger May 31 '17 at 16:21
@Lukas Sorry, it looks like I was wrong about that bit. – Antimony Jun 01 '17 at 00:39

JVM spec: grammar for generic signatures requires lookahead

1 Answers1