[Update, new insights, it felt something was missing until now]
Regarding the earlier answer:
- Arrays are covariant like other types can be. You can implement things like 'object[] foo = new string[5];' with covariance, so that is not the reason.
- Compatibility is probably the reason for not reconsidering the design, but I argue this is also not the correct answer.
However, the other reason I can think of is because an array is the 'basic type' for a linear set of elements in memory. I've been thinking about using Array<T>, which is where you might also wonder why T is an Object and why this 'Object' even exists? In this scenario T[] is just what I consider another syntax for Array<T> which is covariant with Array. Since the types actually differ, I consider the two cases similar.
Note that both a basic Object and a basic Array are not requirements for an OO language. C++ is the perfect example for this. The caveat of not having a basic type for these basic constructs is not being able to work with arrays or objects using reflection. For objects you're used to making Foo things which makes an 'object' feel natural. In reality, not having an array base class makes it equally impossible to do Foo -- which is not as frequently used, but equally important for the paradigm.
Therefore, having C# without an Array base type, but with the riches of runtime types (particularly reflection) is IMO impossible.
So more into the details...
Where are arrays used and why are they arrays
Having a basic type for something as fundamental as an array is used for a lot of things and with good reason:
Yea well, we already knew that people use T[]
, just like they use List<T>
. Both implement a common set of interfaces, to be exact: IList<T>
, ICollection<T>
, IEnumerable<T>
, IList
, ICollection
and IEnumerable
.
You can easily create an Array if you know this. We also all know this to be true, and it's not exciting, so we're moving on...
If you dig into List you will end up with an Array eventually - to be exact: a T[] array.
So why's that? While you could have used a pointer structure (LinkedList), it's just not the same. Lists are continuous blocks of memory and get their speed by being a continuous block of memory. There's a lot of reasons about this, but simply put: processing continuous memory is the fastest way of processing memory - there are even instructions for that in your CPU that make it faster.
A careful reader might point at the fact that you don't need an array for this, but a continuous block of elements of type 'T' that IL understands and can process. In other words, you could get rid of the Array type here, as long as you make sure there's another type that can be used by IL to do the same thing.
Note that there's value and class types. In order to retain the best possible performance, you need to store them in your block as-such... but for marshalling it's simply a requirement.
Marshalling uses basic types that all languages agree upon to communicate. These basic types are things like byte, int, float, pointer... and array. Most notably is the way arrays are used in C/C++, which is like this:
for (Foo *foo = beginArray; foo != endArray; ++foo)
{
// use *foo -> which is the element in the array of Foo
}
Basically this sets a pointer at the start of the array and increments the pointer (with sizeof(Foo) bytes) until it reaches the end of the array. The element is retrieved at *foo - which gets the element the pointer 'foo' is pointing at.
Note again that there are value types and reference types. You really don't want a MyArray that simply stores everything boxed as an object. Implementing MyArray just got a hell of a lot more tricky.
Some careful readers can point at the fact here that you don't really need an array here, which is true. You need a continuous block of elements with the type Foo - and if it's a value type, it must be stored in the block as the (byte representation of the) value type.
So more... What about multi-dimensionality? Apparently the rules aren't so black and white, because suddenly we don't have all the base classes anymore:
int[,] foo2 = new int[2, 3];
foreach (var type in foo2.GetType().GetInterfaces())
{
Console.WriteLine("{0}", type.ToString());
}
Strong type just went out of the window, and you end up with collection types IList
, ICollection
and IEnumerable
. Hey, how are we supposed to get the size then? When using the Array base class, we could have used this:
Array array = foo2;
Console.WriteLine("Length = {0},{1}", array.GetLength(0), array.GetLength(1));
... but if we look at the alternatives like IList
, there's no equivalent. How are we going to solve this? Should introduce a IList<int, int>
here? Surely this is wrong, because the basic type is just int
. What about IMultiDimentionalList<int>
? We can do that and fill it up with the methods that are currently in Array.
Have you noticed that there are special calls for reallocating arrays? This has everything to do with memory management: arrays are so low-level, that they don't understand what growth or shrinking are. In C you would use 'malloc' and 'realloc' for this, and you really should implement your own 'malloc' and 'realloc' to understand why exactly having fixed sizes is important for all things you directly allocate.
If you look at it, there's only a couple of things that get allocated in a 'fixed' sizes: arrays, all basic value types, pointers and classes. Apparently we handle arrays differently, just like we handle basic types differently.
A side note about type safety
So why need these all these 'access point' interfaces in the first place?
The best practice in all cases is to provide users with a type safe point of access. This can illustrated by comparing code like this:
array.GetType().GetMethod("GetLength").Invoke(array, 0); // don't...
to code like this:
((Array)someArray).GetLength(0); // do!
Type safety enable you to be sloppy when programming. If used correctly, the compiler will find the error if you made one, instead of finding it out run-time. I cannot stress enough how important this is - after all, your code might not be called in a test case at all, while the compiler will always evaluate it!
Putting it all together
So... let's put it all together. We want:
- A strongly typed block of data
- That has its data stored continuously
- IL support to make sure we can use the cool CPU instructions that make it bleeding fast
- A common interface that exposes all the functionality
- Type safety
- Multi-dimensionality
- We want value types to be stored as value types
- And the same marshalling structure as any other language out there
- And a fixed size because that makes memory allocation easier
That's quite a bit of low level requirements for any collection... it requires memory to be organized in a certain way as well as conversion to IL/CPU... I'd say there's a good reason it's considered a basic type.