6

It seems like .NET goes out of its way to make strings that are equal by value equal by reference.

In LINQPad, I tried the following, hoping it'd bypass interning string constants:

var s1 = new string("".ToCharArray());
var s2 = new string("".ToCharArray());

object.ReferenceEquals(s1, s2).Dump();

but that returns true. However, I want to create a string that's reliably distinguishable from any other string object.

(The use case is creating a sentinel value to use for an optional parameter. I'm wrapping WebForms' Page.Validate(), and I want to choose the appropriate overload depending on whether the caller gave me the optional validation group argument. So I want to be able to detect whether the caller omitted that argument, or whether he passed a value that happens to be equal to my default value. Obviously there's other less arcane ways of approaching this specific use case, the aim of this question is more academical.),

millimoose
  • 39,073
  • 9
  • 82
  • 134
  • 3
    string.Empty has special handling. The behavior you describe does not apply to strings that have Length > 0. – phoog Dec 12 '12 at 03:17

4 Answers4

6

It seems like .NET goes out of its way to make strings that are equal by value equal by reference.

Actually, there are really only two special cases for strings that exhibit behavior like what you're describing here:

  1. String literals in your code are interned, so the same literal in two places will result in a reference to the same object.
  2. The empty string is a particularly weird case, where as far as I know literally every empty string in a .NET program is in fact the same object (i.e., "every empty string" constitutes a single string). This is the only case I know of in .NET where using the new keyword (on a class) may potentially not result in the allocation of a new object.

From your question I get the impression you already knew about the first case. The second case is the one you've stumbled across. As others have pointed out, if you just go ahead and use a non-empty string, you'll find it's quite easy to create a string that isn't reference-equal to any other string in your program:

public static string Sentinel = new string(new char[] { 'x' });

As a little editorial aside, I actually wouldn't mind this so much (as long as it were documented); but it kind of irks me that the CLR folks (?) implemented this optimization without also going ahead and doing the same for arrays. That is, it seems to me they might as well have gone ahead and made every new T[0] refer to the same object too. Or, you know, not done that for strings either.

Dan Tao
  • 125,917
  • 54
  • 300
  • 447
  • Fascinating. Is there anything in the spec that expounds on this nuance? I admit I was startled there was *ever* a case when `new Anything(...)` would ever not return two distinct instances. – Kirk Woll Dec 12 '12 at 03:44
  • 1
    @Kirk: I'm not sure; I haven't scoured the spec for an explanation. But for some reason I want to say it isn't documented (I have some vague, unreliable memory of looking into it at one point). In any case, it's certainly surprising to pretty much every dev who independently discovers it! – Dan Tao Dec 12 '12 at 03:59
3

If the strings are ReferenceEqual, they are the same object. When you call new string(new char[0]), you don't get a new object that happens to be reference-equal to string.Empty; that would be impossible. Rather, you get a new reference to the already-created string.Empty instance. This is a result of special-case code in the string constructor.

Try this:

var s1 = new string(new char { 'A', 'b' });
var s2 = new string(new char { 'A', 'b' });

object.ReferenceEquals(s1, s2).Dump();

Also, beware that string constants are interned, so all instances of the literal "Ab" in your code will be reference equal to one another, because they all refer to the same string object. Constant folding applies, too, so the constant expression "A" + "b" will also be reference equal to "Ab".

Your sentinal value, therefore, can be a privately-created non-zero-length string.

phoog
  • 42,068
  • 6
  • 79
  • 117
  • @KirkWoll well, sure. The string constructor is in internal call, so it's handled by the runtime. But it's still the string constructor, isn't it? – phoog Dec 12 '12 at 03:44
  • 2
    It's not just that it's interning. `object.ReferenceEquals("a", "a")` makes sense to me. `object.ReferenceEquals(new string("a"), new string("a"))` surprises me. The former is the case of the *compiler* optimizing string literals. The latter is the case of the *runtime* optimizing string literals. – Kirk Woll Dec 12 '12 at 03:47
  • 2
    @KirkWoll there's no string conductor that takes a string as an argument, so if your second example compiles, you are right to be surprised. If you pass a 1-element `char[]`, containing `'a'`, you'll find that the return value is `false`. – phoog Dec 12 '12 at 03:56
  • Sure, sorry I was sloppy. What I was surprised about was this: `new string("".ToCharArray()) == new string("".ToCharArray())` (that returns `true`) – Kirk Woll Dec 12 '12 at 04:47
  • 1
    Ah, right, that *is* surprising. But I'm not saying that it is because of interning. *That* is because of the special treatment for zero-length strings (apparently introduced in CLR 2.0). – phoog Dec 12 '12 at 04:56
1

You can put non-printable characters into the string... even the 0/nul character. But really, I'd just use null for the sentinel value, and try to ensure code elsewhere is using the empty string instead of null.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
0

So I want to be able to detect whether the caller omitted that argument, or whether he passed a value that happens to be equal to my default value.

I've never done this before, but my thoughts would be to make a Nullable class... but instead of Nullable it would be Parameter and would keep track on whether or not it has been assigned anything (including null).

Ian R. O'Brien
  • 6,682
  • 9
  • 45
  • 73
NPSF3000
  • 2,421
  • 15
  • 20