42

Java introduced type erasure with generics in Java 5 so they would work on old versions of Java. It was a tradeoff for compatibility. We've since lost that compatibility[1] [2] [3]--bytecode can be run on later versions of the JVM but not earlier ones. This looks like the worse possible choice: we've lost type information and we still can't run bytecode compiled for newer versions of the JVM on older versions. What happened?

Specifically I'm asking if there are any technical reasons why type erasure couldn't be removed in the next version of the JVM (assuming, like previous releases, its bytecode won't be able to run on the last version anyway).

[3]: Type erasure could be backported in a manner similar to retrolambda for those who really like it.

Edit: I think the discussion of the definition of backwards vs. forwards compatibility is obscuring the question.

Community
  • 1
  • 1
Prime
  • 4,081
  • 9
  • 47
  • 64
  • 7
    Backwards compatibility was never lost. An old Java 1.1 program will most probably still run smoothly on a JRE 8 VM, which is what the linked posts say. – Tunaki Jun 27 '16 at 13:59
  • @Tunaki correct me if I'm wrong, but that's forwards compatibility: your code will run on future JVMs. If we remove type erasure in JVM version X, it'll still be removed in JVM X+1, so our code will still run. – Prime Jun 27 '16 at 14:00
  • 1
    Not exactly, refer also to http://stackoverflow.com/questions/4692626/is-jdk-upward-or-backward-compatible – Tunaki Jun 27 '16 at 14:03
  • @Tunaki I'm aware of that answer and I think we're confusing terms rather than ideas. It's patently obvious that certain Java 8 features will not work on JVM 7, hence the existence of projects like retrolambda. Here's a list of incompabilities from Oracle: http://www.oracle.com/technetwork/java/javase/8-compatibility-guide-2156366.html#A999387 . Quoting it: "Class files built with the Java SE 8 compiler will not run on earlier releases of Java SE". – Prime Jun 27 '16 at 14:06
  • 4
    @Prime - but that is NOT what backwards compatilbiity means in the context of Java. – Stephen C Jun 27 '16 at 22:25
  • 3
    Java 8 is backwards compatible with code written for older versions of Java. You may think of your code being forwards compatible with future JREs, but it is actually those future JREs that are backwards compatible with your old code. Your question shows a lack of understanding what backwards compatible means in Java. – Andreas Jul 07 '16 at 17:34
  • 2
    @Prime you are correct, posters are wrong. the code you write is forward compatible. it compiles and runs on JVM 8,9,10,11... the byte code generated is not backward compatible. i. e you cannot run it on jvm 5,4,3,2. the JVM itself is backward compatible because it can run JVM 5,4,3,2 code. and so is the JDK because it supports compilation of older code. Type errasure is a known problem and research and discussion is going on to either change it or remove it. – Simple Fellow May 14 '18 at 18:18

3 Answers3

21

Type erasure is more than just a byte code feature that you can turn on or off.

It affects the way the entire runtime environment works. If you want to be able to query the generic type of every instance of a generic class, it implies that meta information, comparable to a runtime Class representation, is created for each object instantiation of a generic class.

If you write new ArrayList<String>(); new ArrayList<Number>(); new ArrayList<Object>() you are not only creating three objects, you are potentially creating three additional meta objects reflecting the types, ArrayList<String>, ArrayList<Number>, and ArrayList<Object>, if they didn’t exist before.

Consider that there are thousand of different List signatures in use in a typical application, most of them never used in a place where the availability of such Reflection is required (due to the absence of this feature, we could conclude that currently, all of them work without such a Reflection).

This, of course, multiplies, thousand different generic list types imply thousand different generic iterator types, thousand spliterator and Stream incarnations, not even counting the internal classes of the implementation.

And it even affects places without an object allocation which are currently exploting the type erasure under the hood, e.g. Collections.emptyList(), Function.identity() or Comparator.naturalOrder(), etc. return the same instance each time they are invoked. If you insist on having the particalar captured generic type reflectively inspectable, this won’t work anymore. So if you write

List<String> list=Collections.emptyList();
List<Number> list=Collections.emptyList();

you would have to receive two distinct instances, each of them reporting a different on getClass() or the future equivalent.


It seems, people wishing for this ability have a narrow view on their particular method, where it would be great if they could reflectively find out whether one particular parameter is actually one out of two or three types, but never think about the weight of carrying meta information about potentially hundreds or thousands generic instantiations of thousands of generic classes.

This is the place where we have to ask what we gain in return: the ability to support a questionable coding style (this is what altering the code’s behavior due to information found via Reflection is all about).


The answer so far only addressed the easy aspect of removing type erasure, the desire the introspect the type of an actual instance. An actual instance has a concrete type, which could be reported. As mentioned in this comment from the user the8472, the demand for removal of type erasure often also implies the wish for being able to cast to (T) or create an array via new T[] or access the type of a type variable via T.class.

This would raise the true nightmare. A type variable is a different beast than the actual type of a concrete instance. A type variable could resolve to a, e.g. ? extends Comparator<? super Number> to name one (rather simple) example. Providing the necessary meta information would imply that not only object allocation becomes much more expensive, every single method invocation could impose these additional cost, to an even bigger extend as we are now not only talking about the combination of generic classes with actual classes, but also every possible wildcarded combination, even of nested generic types.

Keep in mind that the actual type of a type parameter could also refer to other type parameters, turning the type checking into a very complex process, which you not only have to repeat for every type cast, if you allow to create an array out of it, every storage operation has to repeat it.

Besides the heavy performance issue, the complexity raises another problem. If you look at the bug tracking list of javac or related questions of Stackoverflow, you may notice that the process is not only complex, but also error prone. Currently, every minor version of javac contains changes and fixes regarding generic type signature matching, affecting what will be accepted or rejected. I’m quite sure, you don’t want intrinsic JVM operations like type casts, variable assignments or array stores to become victim of this complexity, having a different idea of what is legal or not in every version or suddenly rejecting what javac accepted at compile-time due to mismatching rules.

Community
  • 1
  • 1
Holger
  • 285,553
  • 42
  • 434
  • 765
  • 4
    you are assuming that it would only be used via reflection. but non-erased generics wouild mean that the type variables in classes (e.g. T) would be real, i.e. casting via `(T)` would be an actual cast and provide fail-fast behavior, `new T[]` would create an array `T.class` could give you a class object. Also, while you're right that the metadata objects would need to be created they would still only be a constant factor over the number of classes since the generic signatures are ultimately driven by allocation callsites of which there only is a finite amount per class. – the8472 Jun 27 '16 at 21:51
  • 4
    @the8472: I focused on the *easy* part, as what you are describing makes things much worse. Since there are also generic *methods*, the features you describe would imply that each *invocation* can bear that overhead, not only object allocation sites. I expended my answer to address some of the related issues. Of course, the number of call sites is finite as well, but the number of stars in our universe might be finite too… – Holger Jun 28 '16 at 09:01
  • 1
    good point, distinguishing between type bounds and concrete types adds another source of complexity. – the8472 Jun 28 '16 at 17:46
  • 1
    This answer is not convincing. Meta information *is* available at runtime (through `getGenericType()`, so it's already managed; we are just not allowed to write (non-reflective) code that _uses_ it. Regarding performance, you'll have to compare the compound runtime with all the nasty workarounds that exist to get nice designs out of Java generics, not with a world where you don't use them (or don't try to work around the limitations). – Raphael Mar 20 '18 at 08:04
  • 3
    @Raphael `getGenericType()` will only provide you the *declaration*, not the actual type of an instantiation. You can use Reflection to find out that the declaration of the *type* `java.util.List` is `List`; that doesn’t say anything about the actual parameterization of the thousands of list *instances*. You can query the generic type of variables, i.e. fields and parameters (not local variables), but the same object can be referenced by dozens of different variables of different type; the object itself does not have a generic type. As shown by example in the answer. – Holger Mar 20 '18 at 08:18
  • @Holger Ah, I misunderstood. Thanks. – Raphael Mar 20 '18 at 09:35
  • @Holger *getGenericType() will only provide you the declaration, not the actual type of an instantiation* that's a great point here. I don't understand why people say that type erasure erases everything, while `public static int size(List list) { return list.size(); }` compiles to `public static int size(java.util.List);`. So type erasure works at call-sites, really, in my understanding at least... – Eugene Aug 31 '18 at 11:21
  • 3
    @Eugene First of all, Generics do not alter the way compiled code works at all. Letting Reflection aside, you can run compiled generic code on a JVM which doesn’t know anything about Generics at all. That’s why for `int size(List list)`, the generic signature will be stored, but you still can’t have a method `int size(List list)` in the same class, as having two `int size(List)` methods in one class is forbidden. – Holger Aug 31 '18 at 13:46
  • @Holger I understand that (I think). my point was that the generic declaration is *stored* after compilation in the byte code, it is *erased* from the byte code at all call-sites that use that. And that example with `size(List list)` could work if the return type would be different and a different compiler, not `javac` – Eugene Aug 31 '18 at 14:09
  • I don't accept these trade offs as either necessary or inevitable. It seems to me a slightly intelligent system could easily be implemented where as long as the class in the generic collection was a final type, that the type was remembered. Even if that was the only type of type erasure implemented it would provide immense benefits casting List for instance instead of List>, which is so useless as to be worthless, – ggb667 May 08 '19 at 18:04
  • 4
    @ggb667 A type system that works only in a few cases, is not worth the effort. When I write `List l = List.of("foo", "bar");`, I’m invoking a *generic* method whose implementation code instantiates a *generic* `List`, without a hint that the type is the `final` type `String`. What if I write `List l = List.of("foo", "bar");` instead? Exactly the same code, but now, the code within the `of` method should magically determine that this is *not* a list with a `final` element type? How many levels should the magic propagate? `stream.map(f).collect(toList())`, `final` type or not? – Holger May 09 '19 at 07:59
  • Is a system that works in NONE of the cases better? The benefit desired (IMHO) are methods which when passed different types are handled intelligently, not that they are generic intrinsically. Casts defeat the entire purpose. What we have is (effectively) List> - which is what many dislike intensely. – ggb667 May 09 '19 at 17:09
  • 3
    @ggb667 You don’t have `List>`, you have a tool for compile-time checking of the correctness of generic code. The compiled code only has a `List`. You want some kind of uber-reflection feature, which generics were never meant for. Implementing such a feature would be a lot of heavy work, which the Java core developers won’t do just for the sake of handling a few corner cases. There is no point in making further attempts to convince me, I’m just telling the status quo. – Holger May 09 '19 at 17:31
  • Given that C# reifies generics, presumably it must also deal with the overhead of "hundreds or thousands generic instantiations of thousands of generic classes". Yet, AFAIK, its performance and memory usage are comparable to Java. So perhaps this concern is overstated? – Paul Carey Sep 18 '20 at 08:53
  • 1
    @PaulCarey Does it have an equivalent of `Function.identity()`? When I do `Function f1 = Function.identity(); Function f2 = Function.identity(); Function.identity().getType();` what will it provide? – Holger Sep 18 '20 at 10:18
  • 1
    @Holger I have no real knowledge of C#; I'm making a black-box type observation that the presence of reified generics in a widely used language suggests that the runtime overhead objection may be successfully surmounted. – Paul Carey Sep 19 '20 at 07:08
  • Also, if you've ever used jackson's `TypeReference`, you should know that it's sole purpose is to cheat out type erasure making generic type to be present at runtime, creating - whoopsie! - not an object with meta information about type, but a completely new type. And every lambda does the same, by the way. – Etki Jan 05 '21 at 17:21
18

To some extent erasure will be removed in the future with project valhalla to enable specialized implementations for value types.

Or to put it more accurately, type erasure really means the absence of type specialization for generics, and valhalla will introduce specialization over primitives.

Specifically I'm asking if there are any technical reasons why type erasure couldn't be removed in the next version of the JVM

Performance. You don't have to generate specialized code for all combinations of generic types, instances or generated classes don't have to carry type tags, polymorphic inline caches and runtime type checks (compiler-generated instanceof checks) stay simple and we still get most of the type-safety through compile-time checks.

Of course there are also plenty of downsides, but the tradeoff has already been made, and the question what would motivate the JVM devs to change that tradeoff.

And it might also be a compatibility thing, there could be code that performs unchecked casts to abuse generic collections by relying on type erasure that would break if the type constraints were enforced.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • 1
    Why does having values of generic type parameters at runtime imply specialization? – Raphael Mar 20 '18 at 08:00
  • Your last paragraph is very insightful. I've heard that type erasure breaks encapsulation, but I haven't yet found an example of that. Maybe by abusing type erasure by casting to a raw collection it would be possible to create a type and then cast the object to that in such a way so one could get access to otherwise encapsulated object state. Or maybe using reflection. I don't know. – Igor Donin Apr 24 '22 at 12:40
5

Your understanding of backwards compatibility is wrong.

The desired goal is for new JVM's to be able to run old library code correctly and unchanged even with new code. This allows users to upgrade their Java versions reliably even to much newer versions than the code was written for.

Thorbjørn Ravn Andersen
  • 73,784
  • 33
  • 194
  • 347
  • 2
    How would removing type erasure hinder this? Old code wouldn't rely on type information (because it doesn't exist) and new code would only be run on the new JVM. Everyone could upgrade seamlessly. – Prime Jun 27 '16 at 14:24
  • 1
    @Prime - It wouldn't. However you use your bogus idea of backwards compatibility as the justification for throwing out >>real<< backwards compatibility. Also, your bold assertion that "everyone could seamlessly upgrade" requires a lot more justification. I for one DO NOT believe it. – Stephen C Jun 27 '16 at 22:21
  • 5
    @StephenC type erasure is subtractive--if your program assumes type information has been erased, it will run fine so long as there is a java.util.List[Object] in the classpath. Adding more type information might mean something like adding specializations, so the JVM would be aware of e.g. java.util.List[Integer] and could avoid runtime type-checking. It is possible to get this information at runtime by using reified types (as scala does, see http://docs.scala-lang.org/overviews/reflection/typetags-manifests.html). Just like how you can always get information available at compile... – Prime Jun 28 '16 at 11:09
  • 2
    time at runtime if you are determined enough (in the case of scala, by having the compiler add it all in for you). I'm not sure this would be a *good* idea, but my question was about if there were any technical barriers to doing it. And the answers given thus far have been enlightening in that respect. – Prime Jun 28 '16 at 11:10