Why does the CLR re-use empty strings, but not empty arrays?

Question

I notice that

Console.WriteLine((object) new string(' ', 0) == (object) new string(' ', 0));

prints true, which indicates that the CLR keeps the empty string around and re-uses the same instance. (It prints false for any other number than 0.)

However, the same is not true for arrays:

Console.WriteLine(new int[0] == new int[0]);   // False

Now, if we look at the implementation of Enumerable.Empty<T>(), we find that it caches and re-uses empty arrays:

public static IEnumerable<TResult> Empty<TResult>()
{
    return EmptyEnumerable<TResult>.Instance;
}

[...]

public static IEnumerable<TElement> Instance
{
    get
    {
        if (EmptyEnumerable<TElement>.instance == null)
            EmptyEnumerable<TElement>.instance = new TElement[0];
        return EmptyEnumerable<TElement>.instance;
    }
}

So the framework team felt that keeping an empty array around for every type is worth it. The CLR could, if it wanted to, go a small step further and do this natively so it applies not only to calls to Enumerable.Empty<T>() but also new T[0]. If the optimisation in Enumerable.Empty<T>() is worth it, surely this would be even more worth it?

Why does the CLR not do this? Is there something I’m missing?

I think the biggest anomaly is that `new string(...)` can return an existing reference, rather than that it doesn't for arrays. This has always seemed like an oddity to me. — Jon Skeet, Oct 27 '11 at 12:16
@JonSkeet: It would be unusual if the string constructor were implemented in IL, but it isn’t, and returning an existing reference doesn’t violate any semantics, so I don’t find it odd at all. — Timwi, Oct 27 '11 at 12:19
If array returned by Empty has been modified by adding few items whether it affects the cached empty array (cached reference)? So how next `Empty` call passes `EmptyEnumerable.instance == null` check — sll, Oct 27 '11 at 12:20
@sll Basic arrays can't have items added or remove. Once sized they are sized for good. — Adam Houldsworth, Oct 27 '11 at 12:22
@Adam Houldsworth : thanks! I've Forgot this important thing! — sll, Oct 27 '11 at 12:25
@Timwi: The behaviour violates the C# language specification. From section 7.6.10: "The new operator is used to create new instances of types." and "The new operator implies creation of an instance of a type." (There are probably more examples I could cite.) Checking for violation of ECMA-335 as well... — Jon Skeet, Oct 27 '11 at 12:25
FWIW, you'll often find `private static readonly SomeType[] nix = new SomeType[0];` littered in my code ;p — Marc Gravell, Oct 27 '11 at 12:26
@Jon there are other ways to subvert `new`, of course. While it is *an* exception, it isn't isolated (although the other exceptions are in the "forget the rules, do what you want" namespace allowance) — Marc Gravell, Oct 27 '11 at 12:28
@MarcGravell: Surely instead of littering every type with that, you could just declare a `public static Empty { public static readonly T[] Array = new T[0]; }` and then access `Empty.Array` etc.? — Timwi, Oct 27 '11 at 12:29
@Timwi: And it looks to me like it violates section 4.21 of partition III of ECMA-335: "The newobj instruction creates a new object or a new instance of a value type." It's clearly not doing that here. — Jon Skeet, Oct 27 '11 at 12:30
@MarcGravell: Are you thinking of instantiating interfaces via COM? I've always found that somewhat surprising too... — Jon Skeet, Oct 27 '11 at 12:31
@Timwi: Unfortunately, static member access on generic types is (by comparison) quite slow. This would more than eliminate any optimization you might get by caching the value. — Adam Robinson, Oct 27 '11 at 12:32
@Jon that was one of them; but I was actually thinking of `ProxyAttribute` - inherit that, and you can return anything you like for `new` (via `CreateInstance`). — Marc Gravell, Oct 27 '11 at 12:33
@AdamRobinson: Surely only the first time it is accessed for any particular argument type. After that, the JITter turns it into a direct memory access, which is blazing fast. Or am I wrong about that? — Timwi, Oct 27 '11 at 12:35
@Jon - to play devil's advocate, how does the spec indicate that new objects or instances of value types can be distinguished from existing objects or instances? That is, is there any requirement that a "new object" will not be reference equal to an existing object? — kvb, Oct 27 '11 at 15:30
@kvb: A reference isn't an object. The spec talks about value type values slightly separately. A reference *isn't* an object - the spec talks about creating a new object and returning a reference to that object, for reference types. — Jon Skeet, Oct 27 '11 at 15:32
@Jon - Fair enough, let me try to rephrase: when the spec talks about "creating a new object and returning a reference to that object", is there a guarantee that that reference will not equal any existing references? It seems like a desirable/intuitive property, but if it's not actually guaranteed then I don't think that the runtime's behavior violates the spec. — kvb, Oct 27 '11 at 17:02
@kvb: If it returned an existing reference, that wouldn't be a reference to a new object, would it? Either it refers to a new object or it doesn't - I don't think there's any real wiggle-room there. — Jon Skeet, Oct 27 '11 at 17:56
@Jon - If "new object" isn't precisely defined by the spec, then I could argue that it doesn't _have_ to be inconsistent with interning immutable objects. Like I said, I'm mostly playing devil's advocate. — kvb, Oct 27 '11 at 18:03
@kvb: I can't think of any possible interpretation of "new" to mean "existing". I'm with Jon on this one; if calling `new` has a case for where an existing object should be referenced, then it seems like a factory approach (which makes no such guarantees, and there's a general assumption that such behavior is to be expected) seems more appropriate. — Adam Robinson, Oct 27 '11 at 21:43
@Adam - Let's say that there were no way to compare object references, but that as an implementation detail the runtime did reuse references. Would that violate the spec? — kvb, Oct 28 '11 at 02:54
@Adam - now let's go back to the situation where `Object.ReferenceEquals` exists. Isn't it arguable that if the spec doesn't require that the method return false when comparing a reference to a new object to a reference to an existing object, that reusing references for immutable objects could still be just an implementation detail and not a violation of the spec? — kvb, Oct 28 '11 at 03:00
@kvb: If there were no way to compare references, then information like "new instance" would just be noise. Furthermore, the entire point of `ReferenceEquals` is to determine *reference equality*; what use would it be if it didn't return `false` for two different references? I suppose that given your suppositions that it wouldn't violate the spec, but they seem like they're just arbitrarily defined in order to back up the position rather than being reasonable possibilities. — Adam Robinson, Oct 28 '11 at 12:21
@kvb: If there existed a static `Object.IsNull(Object)`, but otherwise `ReferenceEquals` was a protected member of `Object`, and if the `Monitor` functions were only usable on some particular derivative of `Object`, then existing instances of immutable objects might be substitutable for existing ones. But neither condition applies. — supercat, Nov 19 '12 at 18:47

H H · Accepted Answer · 2011-10-27T12:31:28.993

9

Strings may use interning, that makes them a different story (from all other kind of objects).

Arrays are essentially just objects. Re-using instances where that is not clear from the syntax or context isn't without side effects or risks.

static int[] empty = new int[0];
...
   lock (empty) { ... }

If some other code locked on another (they thought) empty int[] you might have a deadlock that is very hard to find.

Other scenarios include using arrays as the key in a Dictionary, or anywhere else their identity matters. The framework can't just go around changing the rules.

edited Oct 27 '11 at 12:31

answered Oct 27 '11 at 12:22

H H

263,252
30
330
514

Nice thought, but doesn’t address the question, since the same applies to the empty string. – Timwi Oct 27 '11 at 12:24
2

Only string *literals* use interning by default - and strings are essentially just objects to. Change your example to use `static string empty = new string(' ', 0)` - does it look any less valid? – Jon Skeet Oct 27 '11 at 12:27
Timwi, the interning of strings is well documented, as is the directive _thou shalt not lock on a string_. – H H Oct 27 '11 at 12:28
2

Henk's point still stands that it would have been a point considered by the CLR team if they considered this feature at all. – Adam Houldsworth Oct 27 '11 at 12:29
1

+1 A good reason why you should never lock on strings. I have actually helped a dev work through an issue involving exactly this situation before. – Tim Lloyd Oct 27 '11 at 12:29
@HenkHolterman: The interning of *constant string expressions* is well documented. Normally I'd expect a `new` call to create a new instance though and *not* do any interning. The directive of not locking on strings could reasonably be tied to interning, so if you were generally confident that you *weren't* reusing a string, you might decide to ignore it. You could be wrong though, in this surprising way. – Jon Skeet Oct 27 '11 at 12:32
1

@Jon heh; I repeat my claim that (if we had green fields) `object` should be `abstract`, and `Monitor` should be instance-based - and you should only be able to `lock` a `Monitor` instance. Oh well, that ship has sailed. – Marc Gravell Oct 27 '11 at 12:37
@MarcGravell: I think I've ranted about the design of Object many times :) – Jon Skeet Oct 27 '11 at 12:47

score 0 · Answer 2 · answered Dec 12 '11 at 19:43

Creating an object with "new" will always create a new instance, which may be locked distinctly from any other instance, and which ReferenceEquals will report as distinct from all other instances. If there were system-defined factory methods or properties to create empty arrays, similar to Enumerable<T>.Empty or String.Empty, those properties could return shared object instances, but exposed constructors cannot do anything other than return a new instance or throw an exception.

Why does the CLR re-use empty strings, but not empty arrays?

2 Answers2