Why do generic methods and generic types have different type introduction syntax?

Question

While studying generics, I noticed a difference in type introduction syntax between generic methods and generic types (class or interface) that confused me.

The syntax for a generic method is

<T> void doStuff(T t) {
    // Do stuff with T
}

The docs say

The syntax for a generic method includes a type parameter, inside angle brackets, and appears before the method's return type

The syntax for a generic type is

class Stuff<T> {
    // Do stuff with T
    T t;
}

The docs say

The type parameter section, delimited by angle brackets (<>), follows the class name. It specifies the type parameters

For neither it states why it must come before or after.

In order to be consistent with each other, I expected either the method syntax to be
void doStuff<T>(T t) {}or the type syntax (for class) to be class <T>Stuff {}, but that is obviously not the case.

Why does the one have to be introduced before, and the other after?

I have used generics mostly in the form of List<String> and argued that <String>List might look weird, but that is a subjective argument, besides for methods it is like that as well. You can call doStuff like this.<String>doStuff("a string");

Looking for a technical explanation I thought perhaps <T> must be introduced to a method before specifying the return type because T might be the return type and the compiler maybe isn't able to look ahead like that, but that sounded odd because compilers are smart.

I figure there is an explanation for this beyond "the language designers just made it that way", but I could not find it.

Are c++ templates older than Java generics? Syntax is so similar that maybe the question should be made to C++ designers instead Java ones — Pablo Lozano, Aug 08 '17 at 12:33

score 19 · Accepted Answer · edited Sep 15 '17 at 09:30

19

The answer indeed lies in the GJ Specification, which has already been linked, quote from the document, p.14:

The convention of passing parameters before the method name is made necessary by parsing constraints: with the more conventional “type parameters after method name” convention the expression f (a<b,c>(d)) would have two possible parses.

Implementation from comments:

f(a<b,c>(d)) can parsed as either of f(a < b, c > d) (two booleans from comparisons passed to f) or f(a<B, C>(d)) (call of a with type arguments B and C and value argument d passed to f). I think this might also be why Scala chose to use [] instead of <> for generics.

edited Sep 15 '17 at 09:30

Pier Giorgio Misley

5,305
4
27
66

answered Aug 17 '17 at 09:15

František Hartman

14,436
2
40
60

8

No, just ambiguity. `f(a(d))` can parse as either of `f(a < b, c > d)` (two `boolean`s from comparisons passed to `f`) or `f(a(d))` (call of `a` with type arguments `B` and `C` and value argument `d` passed to `f`). I think this might also be why Scala chose to use `[]` instead of `<>` for generics. – HTNW Aug 18 '17 at 00:28
@HTNW that's a *very* interesting point... at the moment, I find this to be the correct answer and your comment made it as such. thank you – Eugene Aug 18 '17 at 07:59

score 10 · Answer 2 · answered Aug 08 '17 at 13:43

As far as I know generics from Java, when they were introduced, were based on the idea of generics from GJ (an extension of the Java programming language that supports generic types). Therefore the syntax was taken from GJ, see GJ Specification.

This is a formal answer to your question, but not an answer to your question in context of GJ. But it is clear that it has nothing to do with C++ syntax because in C++ parameter section precedes both class keyword and return type of the method.

score 8 · Answer 3 · answered Aug 08 '17 at 12:16

8

My strong assumption is that it's because like you said for a method, the generic parameter can also be the return type of a function:

public <RETURN_TYPE> RETURN_TYPE getResult();

So at the time the compiler reaches the return type of the function, it's type has already been encountered (as in, it knows it's a generic type).

If you had a syntax like

public RETURN_TYPE getResult<RETURN_TYPE>();

it would require a second sweep to parse.

For classes, this is not a problem, because all references to the generic type appear within the class definition block, ie after the generic type has been declared.

answered Aug 08 '17 at 12:16

daniu

14,137
4
32
53

1

that was also one of the reasons I came up with, as you read in the question. I am looking for something more than an assumption to back it up, though – Tim Aug 08 '17 at 12:20
Also I am not sure about this: *it would require a second sweep to parse.* Would the compiler not be able to remember it found an unknown type, and only report it as an error if it has not found a matching type introduction by the time it finishes parsing the method? – Tim Aug 08 '17 at 12:27
@TimCastelijns i think the *compiling* process is consistent, it does not have like special cases and "if statements", like when unknown type is encountered, the compiler is like, let's wait, there could be a generic type declaration... the moment unknown type is encountered the compiler and reports the errors, there is no go back and double checks, that's why generics is declared before being used, when compiler encounters a `<..>` i think it temporary appends this to **known** types list, and proceed working no special cases, no wait for errors, that's my opinion about it – Yazan Aug 17 '17 at 12:42

EJoshuaS - Stand with Ukraine · Answer 4 · 2017-08-18T05:59:17.570

There's not some profound theoretical reason for this - this appears to be a case of "the language designers just did it that way." C#, for example, does use exactly the syntax you're wondering why Java doesn't implement. The following code:

private T Test<T>(T abc)
{
    throw new NotImplementedException();
}

will compile. C# is similar enough to Java that this would imply that there's no theoretical reason that Java couldn't have implemented the same thing, too (especially given that both languages implemented generics early on in their development).

The advantage of the Java syntax as it is now is that it's marginally easier to implement a LL(1) parser for methods using the current syntax.

*as far as I can tell* & *this could be* & 'C# does this and that'. I can't make much of that, this is exactly the kind of answer I am not looking for. Perhaps I misunderstood your point - if so could you rephrase? — Tim, Aug 16 '17 at 20:39

OLIVER.KOO · Answer 5 · 2017-08-16T19:50:48.067

The reason is because the generic type and parameterized type are handled differently during compilation. One is see as Eliding type parameters and the other is Eliding type arguments during the process of erasure.

Generics is added to Java in 2004 within the official version J2SE 5.0. IN an Oracle documentation "Using and Programming Generics in J2SE 5.0" stated

Behind the Scenes

Generics are implemented by the Java compiler as a front-end conversion called erasure, which is the process of translating or rewriting code that uses generics into non-generic code (that is, maps the new syntax to the current JVM specification). In other words, this conversion erases all generic type information; all information between angle brackets is erased. For example, LinkedList will become LinkedList. Uses of other type variables are replaced by the upper bound of the type variable (for example, Object), and when the resulting code is not type correct, a cast to the appropriate type is inserted.

The key is at the process of Type Erasure. No JVM changes were made to support generics so Java doesn’t remember generic type past compilation.

In a called Cost Of Erasure pubslished by University of New Orleans broke down steps of Erasure for us:

The steps performed during type erasure include:

Eliding type parameters : When the compiler finds the definition of a generic type or method, it removes all occurrences of each type parameter replacing it by its left most bound, or Object if no bound is specified.

Eliding type arguments: When the compiler finds a parameterized type, an instantiation of a generic type, it removes the type arguments. For instance, the type List<String> is translated to List.

For generic method the compiler is looking for the generic type definition which is at left most bound. And it literally means furthest to the left and that is why the Bounded Typed Parameters appears before the method's return type For the generic class or interface the compiler is looking for parameterized type which unlike generic type it is not at the left most bound of the class definition but instead follows the className. The compiler then removes the type arguments so JVM can understand it.

If you checkout the Appendix section of Cost Of Erasure paper. It nicely demonstrate how compiler handles generics interface and methods.

Bridge Methods

When compiling a class or interface that extends a parameterized class or implements a parameterized interface, the compiler may need to create a synthetic method, called a bridge method, as part of the type erasure process. You normally don't need to worry about bridge methods, but you might be puzzled if one appears in a stack trace.

Note: In addition the compiler sometimes may need to inserts synthetic bridge methods. Bridge Methods is part of the type erasure process. Bridge methods is responsible to make sure method signatures match after type erasure. Read more about it at Effects of Type Erasure and Bridge Methods

Edit: As OP points out my conclusion of "left most bound" means literally the means furthest to the left is not solid enough. (OP did state in his question that he is not interested in "I think" type of answer) so I did a little digging and found this GenericsFAQ. From the example it seems like the order of the type parameters does matter. i.e. <T extends Cloneable & Comparable<T>> becomes Cloneable after type enrasure but not Comparable

here is another example directly from Oracle Erasure of Generic Type

In the following example, the generic Node class uses a bounded type parameter:

public class Node<T extends Comparable<T>> {
   ...
}

The Java compiler replaces the bounded type parameter T with the first bound class, Comparable.

I think the more technically correct way is to say type erasure replace the bound type with the first bound class (or Object if T is unbounded) it just happens that first bound class is the left most bound because of syntax in Java.

Informative answer. *And it literally means furthest to the left and that is why the Bounded Typed Parameters appears before the method's return type* I don't think you can draw that conclusion. I don't think erasure policy has effect on the order of parameters. The leftmost bound applies to the order of the generics, not the order in which the parameters/keywords appear in the definition of the method — Tim, Aug 16 '17 at 18:02
Idraw conclusion from this answer https://stackoverflow.com/questions/15296193/what-is-meant-by-left-most-bound-for-generic-type-or-a-method-and-why-was-this-p.I agree with one of the comment "Another reason for choosing the leftmost bound is to make the choice well defined so that the user has control over the type used in the erased class.You want this so that you can craft classes to be backwards compatible for old code.i.e.If you want the "raw" erased class to extend a particular type or an erased generic method to have a particular argument type you can control that by choosing the bound" — OLIVER.KOO, Aug 16 '17 at 18:12
@TimCasteligins Thank you for pointing out :) I did some search and I found some example to back up my conclusion. It does mean to me the order of the parameters appear in the definition of the method matters. — OLIVER.KOO, Aug 16 '17 at 18:26
I understood what you mean with the leftmost bound, but what does it have to do with the position of the type parameter in the method definition? I don't understand that conclusion — Tim, Aug 16 '17 at 20:29

Krzysztof Cichocki · Answer 6 · 2017-08-12T19:36:14.640

2

I think, taht it is because you can declare it to be a return type:

 <T> T doStuff(T t) {
     // Do stuff with T
    return t;
}

You need to declare the type before you declare the return type, because you can't use something that is not yet defined. Eg you can't use a variable x before declaring it somwhere. I like (any) language to follow some logical rules, it is then easier to use it and in some point of knowing it you just know what you could expect from it. This is the case with java, it has some odds, but in general it follow some rules. And the one that you can't use something before declaring it is very strong rule in java, and to me it is very nice, because it produces less WTF's when you're trying to understand the java code, that's why I think this is the reasoning behind it. But I don't know who exactly is responsible for that decision, a quote from wikipedia:

In 1998, Gilad Bracha, Martin Odersky, David Stoutamire and Philip Wadler created Generic Java, an extension to the Java language to support generic types.[3] Generic Java was incorporated in Java (2004, Java 5) with the addition of wildcards.

I think that we should ask someone mentioned in the quote above to get the definitive answer, why it is as it is.

I don't believe it has anything to do with backward compatibility with previous version of java.

edited Aug 12 '17 at 19:36

answered Aug 08 '17 at 12:17

Krzysztof Cichocki

6,294
1
16
32

*you can't use something that is not yet defined.* generally speaking that is not true. I can use a variable in a method even if I declare it below the method. Perhaps it is different when declaring types, but I don't know why that would be so – Tim Aug 11 '17 at 13:36
This is incorrect - C# lets you do exactly what the OP is asking about. – EJoshuaS - Stand with Ukraine Aug 11 '17 at 18:44
1

@EJoshuaS there are more languages with the `doStuff()` syntax, but we are talking java here. You cannot say that something is possible in java because it is in C# – Tim Aug 11 '17 at 19:21
@TimCastelijns Yes, I do recognize that - I'm agreeing with your comment, in fact. I was giving a counterexample to the answer - I just don't think that the statement "you can't use something that is not yet defined" is correct because if it *were* correct C# wouldn't be able to do it either, so the argument here [proves too much](https://en.wikipedia.org/wiki/Proving_too_much). – EJoshuaS - Stand with Ukraine Aug 11 '17 at 19:24
@EJoshuaS The correctness of that statement completely depends on the context in which it is presented. Some languages will allow it, others won't – Tim Aug 11 '17 at 19:27
@TimCastelijns Fair enough - I just think that the way that it's phrased in the answer would imply that this isn't possible in *any* language, which simply isn't true. I don't think that that's why Java used the syntax. – EJoshuaS - Stand with Ukraine Aug 11 '17 at 19:29
@EJoshuaS I disagree with that interpretation. Thanks for explaining the reasoning though. – Tim Aug 11 '17 at 19:30

Murat Karagöz · Answer 7 · 2017-08-16T18:17:31.667

Java Generics were introduced with Java 1.5. The idea of new language features is to never break preceding versions. We have to keep in mind that Generics are a type-safety feature for the language/developer. With that two new types were introduced parameterized types and type variables.

The JLS 4.3 Reference Types and Values proposes the following syntax for TypeArgument and TypeVariable.

ReferenceType: ClassOrInterfaceType TypeVariable ArrayType

ClassOrInterfaceType: ClassType InterfaceType

ClassType: TypeDeclSpecifier TypeArgumentsopt

InterfaceType: TypeDeclSpecifier TypeArgumentsopt

TypeDeclSpecifier: TypeName
ClassOrInterfaceType . Identifier

TypeName: Identifier TypeName . Identifier

TypeVariable: Identifier

ArrayType: Type [ ]

The examples are like these

Vector<String>
Seq<Seq<A>>
Seq<String>.Zipper<Integer>
Collection<Integer>
Pair<String,String>

and for parameterized types

Vector<String> x = new Vector<String>();
Vector<Integer> y = new Vector<Integer>();
return x.getClass() == y.getClass();

Whenever no bound is given it will assume it as an java.lang.Object and with type erasure it will be e.g. Vector<Object> so it's backwards compatible with previous versions of Java.

The syntax for Generic Methods where the Class itself is not generic have the following syntax.

From JLs 8.4 Method Declarations

MethodDeclaration: MethodHeader MethodBody

MethodHeader: MethodModifiersopt TypeParametersopt Result MethodDeclarator Throwsopt

MethodDeclarator: Identifier ( FormalParameterListopt )

An example looks like this

public class GenericMethod {
    public static <T> T aMethod(T anObject) {
        return anObject;
    }
    public static void main(String[] args) {
        String greeting = "Hi";
        String reply = aMethod(greeting);
    }
}

Which results with type erasure to

public class GenericMethod {
    public static Object aMethod(Object anObject) {
        return anObject;
    }
    public static void main(String[] args) {
        String greeting = "Hi";
        String reply = (String) aMethod(greeting);
    }
}

And once again it is backwards compatible to previous Java versions. See both proposal papers for more in-depth reasoning

Adding Generics to the Java Programming Language: Participant Draft Specification

Specialization of Java Generic Types

About the technical part. The steps to create a Java program is to compile the .java file. One would do that with the javac command to generate the classfiles. The JavacParser parses the whole file with the above specification and generates the bytecode. See here for the JavacParser source code.

Let's take the following Test.java file

class Things{}

class Stuff<T>{
    T t;

    public <U extends Things> U doStuff(T t, U u){
        return u;
    };
    public <T> T doStuff(T t){
        return t;
    };
}

To keep it backwards compatible the JVM did not change it's previous attributes for the class files. They added a new attribute and named it Signature. From the propsal paper

When used as an attribute of a method or field, a signature gives the full (possibly generic) type of that method or field. When used as a class attribute, a signature indicates the type parameters of the class, followed by its supertype, followed by all its interfaces. The type syntax in signatures is extended to parameterized types and type variables. There is also a new signature syntax for formal type parameters. The syntax extensions for signature strings are as follows:

The JVM Spec 4.3.4 define the following syntax

MethodTypeSignature: FormalTypeParametersopt (TypeSignature*) ReturnType ThrowsSignature*

ReturnType: TypeSignature VoidDescriptor

ThrowsSignature: ^ ClassTypeSignature ^ TypeVariableSignature

By disassembling the Test.class file with javap -v we get the following:

class Stuff<T extends java.lang.Object> extends java.lang.Object
  minor version: 0
  major version: 52
  flags: ACC_SUPER
Constant pool:
   #1 = Methodref          #3.#20         // java/lang/Object."<init>":()V
   #2 = Class              #21            // Stuff
   #3 = Class              #22            // java/lang/Object
   #4 = Utf8               t
   #5 = Utf8               Ljava/lang/Object;
   #6 = Utf8               Signature
   #7 = Utf8               TT;
   #8 = Utf8               <init>
   #9 = Utf8               ()V
  #10 = Utf8               Code
  #11 = Utf8               LineNumberTable
  #12 = Utf8               doStuff
  #13 = Utf8               (Ljava/lang/Object;LThings;)LThings;
  #14 = Utf8               <U:LThings;>(TT;TU;)TU;
  #15 = Utf8               (Ljava/lang/Object;)Ljava/lang/Object;
  #16 = Utf8               <T:Ljava/lang/Object;>(TT;)TT;
  #17 = Utf8               <T:Ljava/lang/Object;>Ljava/lang/Object;
  #18 = Utf8               SourceFile
  #19 = Utf8               Test.java
  #20 = NameAndType        #8:#9          // "<init>":()V
  #21 = Utf8               Stuff
  #22 = Utf8               java/lang/Object
{
  T t;
    descriptor: Ljava/lang/Object;
    flags:
    Signature: #7                           // TT;

  Stuff();
    descriptor: ()V
    flags:
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 4: 0

  public <U extends Things> U doStuff(T, U);
    descriptor: (Ljava/lang/Object;LThings;)LThings;
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=3, args_size=3
         0: aload_2
         1: areturn
      LineNumberTable:
        line 8: 0
    Signature: #14                          // <U:LThings;>(TT;TU;)TU;

  public <T extends java.lang.Object> T doStuff(T);
    descriptor: (Ljava/lang/Object;)Ljava/lang/Object;
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=2, args_size=2
         0: aload_1
         1: areturn
      LineNumberTable:
        line 11: 0
    Signature: #16                          // <T:Ljava/lang/Object;>(TT;)TT;
}
Signature: #17                          // <T:Ljava/lang/Object;>Ljava/lang/Object;
SourceFile: "Test.java"

The method

public <U extends Things> U doStuff(T t, U u){
        return u;
    };

translates to the Signature of to indicate it's a generic method

   Signature: #14                          // <U:LThings;>(TT;TU;)TU;

If we used a non-generic class for previous Java 1.5 versions e.g.

public String doObjectStuff(Object t, String u){
        return u;
    }

would translate to

 public java.lang.String doObjectStuff(java.lang.Object, java.lang.String);
    descriptor: (Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/String;
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=3, args_size=3
         0: aload_2
         1: areturn
      LineNumberTable:
        line 12: 0

The only difference between both are that one has Signature attribute field indicating that it is indeed a generic method while the other previous Java 1.5 versions does not have it. But both have the same descriptor attribute

Non-Generic method
 descriptor: (Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/String;
Generic method 
 descriptor: (Ljava/lang/Object;LThings;)LThings;

Which makes it backwards compatible. So the answer would be as you suggested

"the language designers just made it that way"

with the addition of

"the language designers just made it that way, to make it backwards compatible while not adding much code"

EDIT: About the comment that it should be easy to handle the different syntax I found a passage in the book Java Generics and Collections by Philip Wadler, Maurice Naftalin

Generics in Java resemble templates in C++. There are just two important things to bear in mind about the relationship between Java generics and C++ templates: syntax and semantics. The syntax is deliberately similiar and the semantics are deliberately different.
Syntactically, angle brackets were chosen because they are familiar to C++ users, and because square brackets would be hard to parse. However, there is one difference in syntax. In C++, nested parameters require extra spaces, so you see things like this: List< List > [...] etc.

See here

I probably missed the point but how position of `<>` in the source code is in any way related to the type erasure and backward compatibility? Wherever you put `<>` you get source code that is incompatible with the original compiler anyway. So why this specific choice of location in any more "backward compatible" than the one used in C++ or the one used in C#? — SergGr, Aug 12 '17 at 13:33
@SergGr I meant that generally by erasing the parameterized type it will have the same methodical structure as the previous jre version. Which means that the adjustment of the javac parser is straight forward that way. If the `<>` brackets were completely out of position the developers had to add probably way more lines to the parser — Murat Karagöz, Aug 14 '17 at 08:28
I'm not a professional compilers developer, but I'm not sure how much more code it requires to move declaration the way it is done in C#. And as for C++ syntax for class it is even more in line with this suggestion that what actually was done in Java. So why not that?. Generally this answer looks bad to me with regard to signal/noise ration. — SergGr, Aug 14 '17 at 11:28
@SergGr They tried to add minimal changes to the java parser to resemble the newly generic syntax. Found a passage in the Java Generics books. My answer about the position still is that it was easier to parse and to keep it backwards compatible e.g. to retrofit non-generic library interfaces with the generic syntax. `C#` did not care about the last aspect. I mean I can't tell you more than what is available in the white papers and JSR drafts. This question has a extremely specific nature and I tried to dig into it. — Murat Karagöz, Aug 16 '17 at 18:23

Why do generic methods and generic types have different type introduction syntax?

7 Answers7

Implementation from comments: