17

Bear with me for a while. I know this sounds subjective and argumentative for a while, but I swear there is a question mark at the end, and that the question can actually be answered in an objective way...

Coming from a .NET and C# background, I have during recent years been spoiled with the syntactic sugar that generics combined with extension methods provide in many .NET solutions to common problems. One of the key features that make C# generics so extremely powerful is the fact that if there is enough information elsewhere, the compiler can infer the type arguments, so I almost never have to write them out. You don't have to write many lines of code before you realize how many keystrokes you save on that. For example, I can write

var someStrings = new List<string>();
// fill the list with a couple of strings...
var asArray = someStrings.ToArray();

and C# will just know that I mean the first var to be List<string>, the second one to be string[] and that .ToArray() really is .ToArray<string>().

Then I come to Java.

I have understood enough about Java generics to know that they are fundamentally different, above else in the fact that the compiler doesn't actually compile to generic code - it strips the type arguments and makes it work anyway, in some (quite complicated) way (that I haven't really understood yet). But even though I know generics in Java is fundamentally different, I can't understand why constructs like these are necessary:

 ArrayList<String> someStrings = new ArrayList<String>();
 // fill the list with a couple of strings...
 String[] asArray = someStrings.toArray(new String[0]); // <-- HERE!

Why on earth must I instantiate a new String[], with no elements in it, that won't be used for anything, for the Java compiler to know that it is String[] and not any other type of array I want?

I realize that this is the way the overload looks, and that toArray() currently returns an Object[] instead. But why was this decision made when this part of Java was invented? Why is this design better than, say, skipping the .toArray() that returns Object[]overload entirely and just have a toArray() that returns T[]? Is this a limitation in the compiler, or in the imagination of the designers of this part of the framework, or something else?

As you can probably tell from my extremely keen interest in things of the utmost unimportance, I haven't slept in a while...

Tomas Aschan
  • 58,548
  • 56
  • 243
  • 402
  • http://stackoverflow.com/questions/520527/why-do-some-claim-that-javas-implementation-of-generics-is-bad provides good information – Deco Jan 05 '12 at 06:27
  • Since Java 7 the 'diamond' operator is available, this makes instantiation of generics a bit (really just a bit) more comfortable: `ArrayList l = new ArrayList<>();`. – home Jan 05 '12 at 06:30
  • @home: The "diamond operator" definitely makes things a little easier. A little... ;) But I still find it quite... amusing... that Java requires me to specify the type of the collection 4 times while C# only requires it to be specified once. I still get compile errors if I do something that's not allowed in C# (e.g. try to cast a string to an integer), but *the compiler* keeps track of that for me. I can too, if I want to, but *I don't have to*. – Tomas Aschan Jan 05 '12 at 06:35
  • @Tomas Lycken: full ack, it's a mess that Java advances that slow! I guess it's all because of the fairly complex standardization around the language (JCP) as well as the full backward compatibility strategy. For Microsoft it just seems to be easier to push new features into the language. – home Jan 05 '12 at 06:40
  • 4
    @TomasLycken: C# benefited from being able to learn from a lot of the mistakes Java made along the way. If Java abandoned backward compatibility then it could also do a lot of the things that C# does well. It is pretty ridiculous the amount of declaration you need in Java, but it's an older language with a bunch of new features patched in. – Cameron Skinner Jan 05 '12 at 06:42
  • 2
    *"As you can probably tell from my extremely keen interest in things of the utmost unimportance ..."*. Is this an invitation for us to down-vote your question? :-) – Stephen C Jan 05 '12 at 07:05
  • @StephenC: No, rather an apology on beforehand, with a faint hope that you will show mercy ;) – Tomas Aschan Jan 05 '12 at 07:32
  • possible duplicate of [What are the differences between Generics in C# and Java... and Templates in C++?](http://stackoverflow.com/questions/31693/what-are-the-differences-between-generics-in-c-sharp-and-java-and-templates-i) – nawfal Apr 02 '13 at 22:34

7 Answers7

12

No, most of these reasons are wrong. It has nothing to do with "backward compatibility" or anything like that. It's not because there's a method with a return type of Object[] (many signatures were changed for generics where appropriate). Nor is it because taking an array will save it from reallocating an array. They didn't "leave it out by mistake" or made a bad design decision. They didn't include a T[] toArray() because it can't be written with the way arrays work and the way type erasure works in generics.

It is entirely legal to declare a method of List<T> to have the signature T[] toArray(). However, there is no way to correctly implement such a method. (Why don't you give it a try as an exercise?)

Keep in mind that:

  • Arrays know at runtime the component type they were created with. Insertions into the array are checked at runtime. And casts from more general array types to more specific array types are checked at runtime. To create an array, you must know the component type at runtime (either using new Foo[x] or using Array.newInstance()).
  • Objects of generic (parameterized) types don't know the type parameters they were created with. The type parameters are erased to their erasure (lower bound), and only those are checked at runtime.

Therefore you can't create an array of a type parameter component type, i.e. new T[...].

In fact, if Lists had a method T[] toArray(), then generic array creation (new T[n]), which is not possible currently, would be possible:

List<T> temp = new ArrayList<T>();
for (int i = 0; i < n; i++)
    temp.add(null);
T[] result = temp.toArray();
// equivalent to: T[] result = new T[n];

Generics are just a compile-time syntactic sugar. Generics can be added or removed with changing a few declarations and adding casts and stuff, without affecting the actual implementation logic of the code. Let's compare the 1.4 API and 1.5 API:

1.4 API:

Object[] toArray();
Object[] toArray(Object[] a);

Here, we just have a List object. The first method has a declared return type of Object[], and it creates an object of runtime class Object[]. (Remember that compile-time (static) types of variables and runtime (dynamic) types of objects are different things.)

In the second method, suppose we create a String[] object (i.e. new String[0]) and pass that to it. Arrays have a subtyping relationship based on the subtyping of their component types, so String[] is a subclass of Object[], so this is find. What is most important to note here is that it returns an object of runtime class String[], even though its declared return type is Object[]. (Again, String[] is a subtype of Object[], so this is not unusual.)

However, if you try to cast the result of the first method to type String[], you will get a class cast exception, because as noted before, its actual runtime type is Object[]. If you cast the result of the second method (assuming you passed in a String[]) to String[], it will succeed.

So even though you may not notice it (both methods seem to return Object[]), there is already a big fundamental difference in the actual returned object in pre-Generics between these two methods.

1.5 API:

Object[] toArray();
T[] toArray(T[] a);

The exact same thing happens here. Generics adds some nice stuff like checking the argument type of the second method at compile time. But the fundamentals are still the same: The first method creates an object whose real runtime type is Object[]; and the second method creates an object whose real runtime type is the same as the array you passed in.

In fact, if you try to pass in an array whose class is actually a subtype of T[], say U[], even though we have a List<T>, guess what it would do? It will try to put all the elements into a U[] array (which might succeed (if all the elements happen to be of type U), or fail (if not)) return an object whose actual type is U[].

So back to my point earlier. Why can't you make a method T[] toArray()? Because you don't know the the type of array you want to create (either using new or Array.newInstance()).

T[] toArray() {
    // what would you put here?
}

Why can't you just create a new Object[n] and then cast it to T[]? It wouldn't crash immediately (since T is erased inside this method), but when you try to return it to the outside; and assuming the outside code requested a specific array type, e.g. String[] strings = myStringList.toArray();, it would throw an exception, because there's an implicit cast there from generics.

People can try all sort of hacks like look at the first element of the list to try to determine the component type, but that doesn't work, because (1) elements can be null, and (2) elements can be a subtype of the actual component type, and creating an array of that type might fail later on when you try to put other elements in, etc. Basically, there is no good way around this.

newacct
  • 119,665
  • 29
  • 163
  • 224
6

The toArray(String[]) part is there because the two toArray methods existed before generics were introduced in Java 1.5. Back then, there was no way to infer type arguments because they simply didn't exist.

Java is pretty big on backward compatibility, so that's why that particular piece of the API is clumsy.

The whole type-erasure thing is also there to preserve backward compatibility. Code compiled for 1.4 can happily interact with newer code that contains generics.

Yes, it's clumsy, but at least it didn't break the enormous Java code base that existed when generics were introduced.

EDIT: So, for reference, the 1.4 API is this:

Object[] toArray();
Object[] toArray(Object[] a);

and the 1.5 API is this:

Object[] toArray();
T[] toArray(T[] a);

I'm not sure why it was OK to change the signature of the 1-arg version but not the 0-arg version. That seems like it would be a logical change, but maybe there's some complexity in there that I'm missing. Or maybe they just forgot.

EDIT2: To my mind, in cases like this Java should use inferred type arguments where available, and an explicit type where the inferred type is not available. But I suspect that would be tricky to actually include in the language.

Cameron Skinner
  • 51,692
  • 2
  • 65
  • 86
  • 1
    Good point. Maybe the .NET teams at MS were smart when they first decided to drop backwards support already between .NET 1.1 and 2.0. Their users got used to the risk of breaking changes when major versions are introduced, so when upgrading from 2.0/3.0/3.5 to 4.0 (the latest major upgrade) no one is complaining that much, and the API can be improved between versions :P – Tomas Aschan Jan 05 '12 at 06:40
  • 1
    @TomasLycken: Yep. MS actually did a pretty good job of C#. I really wish Java would drop backwards compatibility, or at least have a deprecation path for non-generic collections that would ultimately allow for non-backwards-compatible code. – Cameron Skinner Jan 05 '12 at 06:44
  • 4
    *"I'm not sure why it was OK to change the signature of the 1-arg version but not the 0-arg version."* - in effect they didn't change anything. The *erased* signature of the 1-arg version is the same as it was before. – Stephen C Jan 05 '12 at 06:56
  • 2
    *"I really wish Java would drop backwards compatibility ..."*. The IT management of your company would probably strongly disagree with you on that. Backwards compatibility is one of Java's biggest selling points for enterprise folks. – Stephen C Jan 05 '12 at 06:59
  • @StephenC: You're absolutely right, and I didn't claim that it was **practical** to drop backwards compatibility. But still, for my day-to-day work as a software engineer for a major software company, I *really* wish Java would lose some of its cruft. – Cameron Skinner Jan 05 '12 at 07:20
  • @CameronSkinner A necessary point is the signature of the 1-arg version is actually ` T[] toArray(T[] a)` and the class is `List`. This means that the type is inferred from the argument as opposed to the collection. It's also worth noting that generics and arrays don't play well together - using `E` instead may have either been impossible due to erasure or broken the contract for pre-generic methods. –  Jan 05 '12 at 08:18
  • The zero arg array alwas returned Object[] and Object[] != String[] so anybody using the returned array as Object[] would suddenly run into errors, that problem did not exist with the one arg version where the type depends on the provided array. Additionally type erasure makes it impossible for the zero arg version to return anything more concrete than Object[] – josefx Jan 05 '12 at 12:03
  • 2
    This answer is wrong. "I'm not sure why it was OK to change the signature of the 1-arg version but not the 0-arg version." This is the whole point. It is **not possible** to correctly write a method `T[] toArray()`. They didn't "overlook" something, nor is it a library design issue. It is just not possible given the nature of arrays and type erasure of generics – newacct Jan 06 '12 at 05:44
6

Others have answered the "why" question (I prefer Cameron Skinner's answer), I will just add that you don't have to instantiate a new array each time and it does not have to be empty. If the array is large enough to hold the collection, it will be used as the return value. Thus:

String[] asArray = someStrings.toArray(new String[someStrings.size()])

will only allocate a single array of the correct size and populate it with the elements from the Collection.

Furthermore, some of the Java collections utility libraries include statically defined empty arrays which can be safely used for this purpose. See, for example, Apache Commons ArrayUtils.

Edit:

In the above code, the instantiated array is effectively garbage when the Collection is empty. Since arrays cannot be resized in Java, empty arrays can be singletons. Thus, using a constant empty array from a library is probably slightly more efficient.

Brandon DuRette
  • 4,810
  • 5
  • 25
  • 31
5

That's because of type erasure. See Problems with type erasure in the Wikipedia article about Java generics: the generic type information is only available at compile time, it is completely stripped by the compiler and is absent at runtime.

So toArray needs another way to figure out what array type to return.

The example provided in that Wikipedia article is quite illustrative:

ArrayList<Integer> li = new ArrayList<Integer>();
ArrayList<Float> lf = new ArrayList<Float>();
if (li.getClass() == lf.getClass())             // evaluates to true <<==
  System.out.println("Equal");
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
Mat
  • 202,337
  • 40
  • 393
  • 406
  • It's not *entirely* due to erasure. When I compile code like `List l = new ArrayList(); l.toArray()` the compiler certainly has enough information to know to return a `String[]`. Sure, if I deserialize a `List` from a stream then I don't have any type information, but the OP seems to be asking why Java cannot infer the type when it **is** available. – Cameron Skinner Jan 05 '12 at 06:30
  • I focused entirely on the question "Why on earth must I instantiate a new String[]..." – Mat Jan 05 '12 at 06:32
  • That's fine, but erasure is still not the whole issue. You're not wrong; I just think there's slightly more to the issue than erasure. – Cameron Skinner Jan 05 '12 at 06:37
  • The Wikipedia text was really illuminating. – Tomas Aschan Jan 05 '12 at 06:38
  • @CameronSkinner It's a combination of erasure and no type inference. FWIW, IDEs remove the need for typing, and some will reduce source clutter by disappearing the instantiation's class name, like the diamond operator, but it's a source view twiddle. – Dave Newton Jan 05 '12 at 06:43
  • *"... the compiler certainly has enough information to know to return a String[]."* . Actually, it is the *runtime system* that needs to know what type of array to create ... and it doesn't due to type erasure. If you think I'm wrong, try writing a generic method that returns an array whose actual basetype is given (solely) by a type parameter. – Stephen C Jan 05 '12 at 07:12
  • @StephenC: Good point. I guess it *is* entirely due to erasure, then :) – Cameron Skinner Jan 05 '12 at 07:21
0

According to Neal Gafter, SUN simply did not have enough resources like MS did.

SUN pushed out Java 5 with type erasure because making generic type info available to runtime means a lot more work. They couldn't delay any longer.

Note: type erasure is not required for backward compatibility - that part is handled by raw types, which is a different concept from reifiable types. Java still has the chance to remove type erasure, i.e. to make all types reifiable; however it's not considered urgent enough to be on any foreseeable agenda.

irreputable
  • 44,725
  • 9
  • 65
  • 93
0

Java generics are really something that doesn't exist. It's only a syntactic (sugar? or rather hamburger?) which is handled by compiler only. It's in reality only short(?)cut to class casting, so if you look into bytecode you can be at first a bit suprised... Type erasure, as noted in post above.

This shortcut to class casting seemed to be a good idea when operating on Collections. In one time you could at least declare what type of element you're storing, and the programmer-independent mechanism (compiler) will check for that. However, using reflection you can still put into such collection whatever you want :)

But when you do generic strategy, that works on generic bean, and it is put into generic service generized(?) from both bean and strategy, taking generic listener for generic bean etc. you're going stright into generic hell. I've once finished with four (!) generic types specified in declaration, and when realized I need more, I've decided to un-generize the whole code because I've run into problems with generic types compliance.

And as for diamond operator... You can skip the diamond and have exactly the same effect, compiler will do the same checks and generate the same code. I doubt in next versions of java it would change, because of this backward compatibility needed... So another thing that gives almost nothing, where Java has much more problems to deal with, e.g. extreme inconvienience when operating with date-time types....

Danubian Sailor
  • 1
  • 38
  • 145
  • 223
-1

I can't speak to the JDK team's design decisions, but a lot of the clumsy nature of Java generics comes from the fact that generics were a part of Java 5 (1.5 - whatever). There are plenty of methods in the JDK which suffer from the attempt to preserve backwards compatibility with the APIs which pre-dated 1.5.

As for the cumbersome List<String> strings = new ArrayList<String>() syntax, I wager it is to preserve the pedantic nature of the language. Declaration, not inference, is the Java marching order.

strings.toArray( new String[0] );? The type cannot be inferred because the Array is itself considered a generic class (Array<String>, if it were exposed).

All in all, Java generics mostly exist to protect you from typing errors at compile-time. You can still shoot yourself in the foot at runtime with a bad declaration, and the syntax is cumbersome. As happens a lot, best practices for generics use are the best you can do.

sarumont
  • 1,734
  • 12
  • 11
  • 1
    "Array is itself considered a generic class" not really. arrays know their component type at runtime; while generic types don't – newacct Jan 06 '12 at 05:49