Different generic behaviour when using lambda instead of explicit anonymous inner class

Question

The context

I'm working on a project that is heavily dependent on generic types. One of its key components is the so-called TypeToken, which provides a way of representing generic types at runtime and applying some utility functions on them. To avoid Java's Type Erasure, I'm using the curly brackets notation ({}) to create an automatically generated subclass since this makes the type reifiable.

What `TypeToken` basically does

This is a strongly simplified version of TypeToken which is way more lenient than the original implementation. However, I'm using this approach so I can make sure that the real problem doesn't lie in one of those utility functions.

public class TypeToken<T> {

    private final Type type;
    private final Class<T> rawType;

    private final int hashCode;


    /* ==== Constructor ==== */

    @SuppressWarnings("unchecked")
    protected TypeToken() {
        ParameterizedType paramType = (ParameterizedType) this.getClass().getGenericSuperclass();
        this.type = paramType.getActualTypeArguments()[0];

        // ...
    }

When it works

Basically, this implementation works perfectly in almost every situation. It has no problem with handling most types. The following examples work perfectly:

TypeToken<List<String>> token = new TypeToken<List<String>>() {};
TypeToken<List<? extends CharSequence>> token = new TypeToken<List<? extends CharSequence>>() {};

As it doesn't check the types, the implementation above allows every type that the compiler permits, including TypeVariables.

<T> void test() {
    TypeToken<T[]> token = new TypeToken<T[]>() {};
}

In this case, type is a GenericArrayType holding a TypeVariable as its component type. This is perfectly fine.

The weird situation when using lambdas

However, when you initialize a TypeToken inside a lambda expression, things start to change. (The type variable comes from the test function above)

Supplier<TypeToken<T[]>> sup = () -> new TypeToken<T[]>() {};

In this case, type is still a GenericArrayType, but it holds null as its component type.

But if you're creating an anonymous inner class, things start to change again:

Supplier<TypeToken<T[]>> sup = new Supplier<TypeToken<T[]>>() {
        @Override
        public TypeToken<T[]> get() {
            return new TypeToken<T[]>() {};
        }
    };

In this case, the component type again holds the correct value (TypeVariable)

The resulting questions

What happens to the TypeVariable in the lambda-example? Why does the type inference not respect the generic type?
What is the difference between the explicitly-declared and the implicitly-declared example? Is type inference the only difference?
How can I fix this without using the boilerplate explicit declaration? This becomes especially important in unit testing since I want to check whether the constructor throws exceptions or not.

To clarify it a bit: This is not a problem that's "relevant" for the program since I do NOT allow non-resolvable types at all, but it's still an interesting phenomenon I'd like to understand.

My research

Update 1

Meanwhile, I've done some research on this topic. In the Java Language Specification §15.12.2.2 I've found an expression that might have something to do with it - "pertinent to applicability", mentioning "implicitly typed lambda expression" as an exception. Obviously, it's the incorrect chapter, but the expression is used in other places, including the chapter about type inference.

But to be honest: I haven't really figured out yet what all of those operators like := or Fi0 mean what makes it really hard to understand it in detail. I'd be glad if someone could clarify this a bit and if this might be the explanation of the weird behavior.

Update 2

I've thought of that approach again and came to the conclusion, that even if the compiler would remove the type since it's not "pertinent to applicability", it doesn't justify to set the component type to null instead of the most generous type, Object. I cannot think of a single reason why the language designers decided to do so.

Update 3

I've just retested the same code with the latest version of Java (I used 8u191 before). To my regret, this hasn't changed anything, although Java's type inference has been improved...

Update 4

I've requested an entry in the offical Java Bug Database/Tracker a few days ago and it just got accepted. Since the developers who reviewed my report assigned the priority P4 to the bug, it might take a while until it'll be fixed. You can find the report here.

A huge shoutout to Tom Hawtin - tackline for mentioning that this might be an essential bug in the Java SE itself. However, a report by Mike Strobel would probably be way more detailed than mine due to his impressive background knowledge. However, when I wrote the report, Strobel's answer wasn't yet available.

You can't declare type tokens generically and expect them to do anything useful. If you could, there would be no need for type tokens. If you want a generic type token, you have to pass in an instance constructed in non-generic code. — Andy Turner, Oct 29 '18 at 07:15
I'm totally aware of the fact that the constructor above doesn't resolve the type variable, but that's not what `TypeToken` has been intended to do. It should save distinct generically-defined type information and do some operations with it. Obviously, it's impossible to tell of which type T is, I just wonder why I cannot access the TypeVariable information when called from a lambda, but I can do so when I call the same constructor from an explicitly declared anonymous inner class. — Quaffel, Oct 29 '18 at 17:32
By TypeVarialbe information I mean the representation by the java.lang.reflect classes, not the actual type information. — Quaffel, Oct 29 '18 at 17:33
Probably this is the answer to your question: https://stackoverflow.com/a/25613179/1110815 — Daniel Dietrich, Nov 11 '18 at 00:48
I do not understand what the word `reifiable` means (first paragraph). — TT., Dec 08 '18 at 07:54
@TT. https://docs.oracle.com/javase/tutorial/java/generics/nonReifiableVarargsType.html — DodgyCodeException, Dec 10 '18 at 14:51
I'm not very familiar in the lower JVM things, but I meet a seem problem when dealing with lambda's generic types after trying using `getGenericSuperclass` just like what're you're doing, and finally, [typetools](https://github.com/jhalterman/typetools) resolve it — a.l., Dec 15 '18 at 05:37
This turned out to be a far more interesting question than I expected, and it exposed a slew of apparent long-standing bugs in both the compiler and core reflection APIs. With luck, we will at least see them fixed in JDK 11, and hopefully even backported to JDK 8. Thanks for posting! — Mike Strobel, Dec 17 '18 at 15:13

Mike Strobel · Accepted Answer · 2018-12-17T15:16:14.273

tldr:

There is a bug in javac that records the wrong enclosing method for lambda-embedded inner classes. As a result, type variables on the actual enclosing method cannot be resolved by those inner classes.

There are arguably two sets of bugs in the java.lang.reflect API implementation:

Some methods are documented as throwing exceptions when nonexistent types are encountered, but they never do. Instead, they allow null references to propagate.

The various Type::toString() overrides currently throw or propagate a NullPointerException when a type cannot be resolved.

The answer has to do with the generic signatures that usually get emitted in class files that make use of generics.

Typically, when you write a class that has one or more generic supertypes, the Java compiler will emit a Signature attribute containing the fully parameterized generic signature(s) of the class's supertype(s). I've written about these before, but the short explanation is this: without them, it would not be possible to consume generic types as generic types unless you happened to have the source code. Due to type erasure, information about type variables gets lost at compilation time. If that information were not included as extra metadata, neither the IDE nor your compiler would know that a type was generic, and you could not use it as such. Nor could the compiler emit the necessary runtime checks to enforce type safety.

javac will emit generic signature metadata for any type or method whose signature contains type variables or a parameterized type, which is why you are able to obtain the original generic supertype information for your anonymous types. For example, the anonymous type created here:

TypeToken<?> token = new TypeToken<List<? extends CharSequence>>() {};

...contains this Signature:

LTypeToken<Ljava/util/List<+Ljava/lang/CharSequence;>;>;

From this, the java.lang.reflection APIs can parse the generic supertype information about your (anonymous) class.

But we already know that this works just fine when the TypeToken is parameterized with concrete types. Let's look at a more relevant example, where its type parameter includes a type variable:

static <F> void test() {
    TypeToken sup = new TypeToken<F[]>() {};
}

Here, we get the following signature:

LTypeToken<[TF;>;

Makes sense, right? Now, let's look at how the java.lang.reflect APIs are able to extract generic supertype information from these signatures. If we peer into Class::getGenericSuperclass(), we see that the first thing it does is call getGenericInfo(). If we haven't called into this method before, a ClassRepository gets instantiated:

private ClassRepository getGenericInfo() {
    ClassRepository genericInfo = this.genericInfo;
    if (genericInfo == null) {
        String signature = getGenericSignature0();
        if (signature == null) {
            genericInfo = ClassRepository.NONE;
        } else {
            // !!!  RELEVANT LINE HERE:  !!!
            genericInfo = ClassRepository.make(signature, getFactory());
        }
        this.genericInfo = genericInfo;
    }
    return (genericInfo != ClassRepository.NONE) ? genericInfo : null;
}

The critical piece here is the call to getFactory(), which expands to:

CoreReflectionFactory.make(this, ClassScope.make(this))

ClassScope is the bit we care about: this provides a resolution scope for type variables. Given a type variable name, the scope gets searched for a matching type variable. If one is not found, the 'outer' or enclosing scope is searched:

public TypeVariable<?> lookup(String name) {
    TypeVariable<?>[] tas = getRecvr().getTypeParameters();
    for (TypeVariable<?> tv : tas) {
        if (tv.getName().equals(name)) {return tv;}
    }
    return getEnclosingScope().lookup(name);
}

And, finally, the key to it all (from ClassScope):

protected Scope computeEnclosingScope() {
    Class<?> receiver = getRecvr();

    Method m = receiver.getEnclosingMethod();
    if (m != null)
        // Receiver is a local or anonymous class enclosed in a method.
        return MethodScope.make(m);

    // ...
}

If a type variable (e.g., F) is not found on the class itself (e.g., the anonymous TypeToken<F[]>), then the next step is to search the enclosing method. If we look at the disassembled anonymous class, we see this attribute:

EnclosingMethod: LambdaTest.test()V

The presence of this attribute means that computeEnclosingScope will produce a MethodScope for the generic method static <F> void test(). Since test declares the type variable W, we find it when we search the enclosing scope.

So, why doesn't it work inside a lambda?

To answer this, we must understand how lambdas get compiled. The body of the lambda gets moved into a synthetic static method. At the point where we declare our lambda, an invokedynamic instruction gets emitted, which causes a TypeToken implementation class to be generated the first time we hit that instruction.

In this example, the static method generated for the lambda body would look something like this (if decompiled):

private static /* synthetic */ Object lambda$test$0() {
    return new LambdaTest$1();
}

...where LambdaTest$1 is your anonymous class. Let's dissassemble that and inspect our attributes:

Signature: LTypeToken<TW;>;
EnclosingMethod: LambdaTest.lambda$test$0()Ljava/lang/Object;

Just like the case where we instantiated an anonymous type outside of a lambda, the signature contains the type variable W. But EnclosingMethod refers to the synthetic method.

The synthetic method lambda$test$0() does not declare type variable W. Moreover, lambda$test$0() is not enclosed by test(), so the declaration of W is not visible inside it. Your anonymous class has a supertype containing a type variable that your the class doesn’t know about because it’s out of scope.

When we call getGenericSuperclass(), the scope hierarchy for LambdaTest$1 does not contain W, so the parser cannot resolve it. Due to how the code is written, this unresolved type variable results in null getting placed in the type parameters of the generic supertype.

Note that, had your lambda had instantiated a type that did not refer to any type variables (e.g., TypeToken<String>) then you would not run into this problem.

Conclusions

(i) There is a bug in javac. The Java Virtual Machine Specification §4.7.7 ("The EnclosingMethod Attribute") states:

It is the responsibility of a Java compiler to ensure that the method identified via the method_index is indeed the closest lexically enclosing method of the class that contains this EnclosingMethod attribute. (emphasis mine)

Currently, javac seems to determine the enclosing method after the lambda rewriter runs its course, and as a result, the EnclosingMethod attribute refers to a method that never even existed in the lexical scope. If EnclosingMethod reported the actual lexically enclosing method, the type variables on that method could be resolved by the lambda-embedded classes, and your code would produce the expected results.

It is arguably also a bug that the signature parser/reifier silently allows a null type argument to be propagated into a ParameterizedType (which, as @tom-hawtin-tackline points out, has ancillary effects like toString() throwing a NPE).

My bug report for the EnclosingMethod issue is now online.

(ii) There are arguably multiple bugs in java.lang.reflect and its supporting APIs.

The method ParameterizedType::getActualTypeArguments() is documented as throwing a TypeNotPresentException when "any of the actual type arguments refers to a non-existent type declaration". That description arguably covers the case where a type variable is not in scope. GenericArrayType::getGenericComponentType() should throw a similar exception when "the underlying array type's type refers to a non-existent type declaration". Currently, neither appears to throw a TypeNotPresentException under any circumstances.

I would also argue that the various Type::toString overrides should merely fill in the canonical name of any unresolved types rather than throwing a NPE or any other exception.

I have submitted a bug report for these reflection-related issues, and I will post the link once it is publicly visible.

Workarounds?

If you need to be able to reference a type variable declared by the enclosing method, then you can't do that with a lambda; you'll have to fall back to the longer anonymous type syntax. However, the lambda version should work in most other cases. You should even be able to reference type variables declared by the enclosing class. For example, these should always work:

class Test<X> {
    void test() {
        Supplier<TypeToken<X>> s1 = () -> new TypeToken<X>() {};
        Supplier<TypeToken<String>> s2 = () -> new TypeToken<String>() {};
        Supplier<TypeToken<List<String>>> s3 = () -> new TypeToken<List<String>>() {};
    }
}

Unfortunately, given that this bug has apparently existed since lambdas were first introduced, and it has not been fixed in the most recent LTS release, you may have to assume the bug remains in your clients’ JDKs long after it gets fixed, assuming it gets fixed at all.

Answer updated with supporting evidence of bugs in both `javac` and the `java.lang.reflect` APIs. — Mike Strobel, Dec 17 '18 at 14:57
One question though, What do you mean by "Nor could the compiler emit the necessary runtime checks to enforce type safety"? What are those runtime checks that the compiler emits? — Ranjith Suranga, Nov 12 '21 at 22:07

score 1 · Answer 2 · answered Dec 09 '18 at 17:07

1

As a workaround, you can move the creation of TypeToken out of lambda to a separate method, and still use lambda instead of fully declared class:

static<T> TypeToken<T[]> createTypeToken() {
    return new TypeToken<T[]>() {};
}

Supplier<TypeToken<T[]>> sup = () -> createTypeToken();

answered Dec 09 '18 at 17:07

Alexei Kaigorodov

13,189
1
21
38

1

This doesn't really get him the same behavior: the resulting `TypeToken` will have different type parameters than the version using anonymous classes. Specifically, the result will always have a generic superclass of `TypeToken`, where `T` is the type variable declared by `createTypeToken`. Presumably OP wants to instantiate a `TypeToken` with a specific type argument and be able to resolve that same type at runtime. – Mike Strobel Dec 11 '18 at 16:46

score 1 · Answer 3 · answered Dec 10 '18 at 14:29

1

I've not found the relevant part of the spec, but here's a partial answer.

There's certainly a bug with the component type being null. To be clear, this is TypeToken.type from above cast to GenericArrayType (yuck!) with the method getGenericComponentType invoked. The API docs do not explicitly mention whether the null returned is valid or not. However, the toString method throws NullPointerException, so there is definitely a bug (at least in the random version of Java I am using).

I don't have a bugs.java.com account, so can't report this. Someone should.

Let's have a look at the class files generated.

javap -private YourClass

This should produce a listing containing something like:

static <T> void test();
private static TypeToken lambda$test$0();

Notice that our explicit test method has it's type parameter, but the synthetic lambda method does not. You might expect something like:

static <T> void test();
private static <T> TypeToken<T[]> lambda$test$0(); /*** DOES NOT HAPPEN ***/
             // ^ name copied from `test`
                          // ^^^ `Object[]` would not make sense

Why doesn't this happen. Presumably because this would be a method type parameter in a context where a type type parameter is required, and they are surprisingly different things. There is also a restriction on lambdas not allowing them to have method type parameters, apparently because there is no explicit notation (some people may suggest this seems like a poor excuse).

Conclusion: There is at least one unreported JDK bug here. The reflect API and this lambda+generics part of the language is not to my taste.

answered Dec 10 '18 at 14:29

Tom Hawtin - tackline

145,806
30
211
305

1

In my opinion, the synthetic method should not get type parameters. After all, such type parameter(s) would be unrelated to `test`’s type parameter(s), so the reported parameterization of `TypeToken` would create even more confusion. The right behavior should be that the local class reports `test()` as its enclosing method, as lambda expressions are supposed to behave like in their surrounding context, i.e. should not reflect the way they are compiled. – Holger Dec 12 '18 at 13:06
@Holger Agreed. I tend to interpret “enclosed in” as “declared in”, i.e., as declared by the developer who wrote the code. A class cannot be declared in a synthetic method because that method never existed in source code form. – Mike Strobel Dec 14 '18 at 02:37
@MikeStrobel `java.lang.Class` has both `getEnclosingClass` and `getDeclaringClass`. Enclosing relates to the immediately enclosing class (perhaps itself a nested class. Declaring is the same as enclosing for member classes, but reports `null` for local/inner/anonymous/top-level classes., So we can be sure this is about the fiction of inner classes, and not real JVM classes or synthetic methods. / I guess in slightly different circumstances the type parameter of `test` may be non-denotable (perhaps, I think, not sure). – Tom Hawtin - tackline Dec 14 '18 at 10:10
@TomHawtin-tackline Allow me to rephrase: the **enclosing** method or class is the nearest outer method or class scope that encompasses the **declaration site**. This refers to source code scope (pre-compilation). The `EnclosingMethod` of an anonymous inner class declared in `m()` is `m()`, even though the compiled inner class lives outside both `m()` and its enclosing class. It would be consistent for an anonymous class declared in a lambda inside of `m()` to have `m()` as its `EnclosingMethod`. It shouldn't matter that the compiler moved its instantiation. – Mike Strobel Dec 14 '18 at 13:57
According to the [JVMS §4.7.7](https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.7.7), this is indeed a bug. `EnclosingMethod` explicitly refers to the "closest **lexically enclosing** method" of the class containing the attribute (which, by definition, cannot be a synthesized method that did not exist in the source code). Further, the javadocs for various `java.lang.reflect` APIs suggest that the current implementation is failing to throw the appropriate exceptions when a type cannot be resolved. I've updated my answer accordingly, and also filed bug reports with Oracle. – Mike Strobel Dec 17 '18 at 14:59