240

It's quite annoying to test all my strings for null before I can safely apply methods like ToUpper(), StartWith() etc...

If the default value of string were the empty string, I would not have to test, and I would feel it to be more consistent with the other value types like int or double for example. Additionally Nullable<String> would make sense.

So why did the designers of C# choose to use null as the default value of strings?

Note: This relates to this question, but is more focused on the why instead of what to do with it.

Community
  • 1
  • 1
Marcel
  • 15,039
  • 20
  • 92
  • 150
  • 55
    Do you consider this a problem for *other* reference types? – Jon Skeet Jan 15 '13 at 12:18
  • 20
    @JonSkeet No, but only because I initially, wrongly, thought that strings are value types. – Marcel Jan 15 '13 at 12:23
  • 24
    @Marcel: That's a pretty good reason for wondering about it. – T.J. Crowder Jan 15 '13 at 12:25
  • 7
    @JonSkeet Yes. Oh yes. (But you’re no stranger to the non-nullable reference type discussion …) – Konrad Rudolph Jan 15 '13 at 14:14
  • 7
    I believe you would have a much better time if you used assertions on your strings in places where you expect them NOT to be `null` (and also I recommend that you conceptually treat `null` and empty strings as different things). A null value could be the result of an error somewhere, while an empty string should convey a different meaning. – diegoreymendez Jan 15 '13 at 17:08
  • 1
    Null starts to be not very well considered. See here, "Null Reference : the billion dollar mistake" http://qconlondon.com/london-2009/presentation/Null+References:+The+Billion+Dollar+Mistake. Or here by the Google Guava library (in Java but still relevant) http://code.google.com/p/guava-libraries/wiki/UsingAndAvoidingNullExplained – JohnCastle Jan 15 '13 at 22:44
  • 8
    @JohnCastle I dare you to ask database developers who understand the value of trinary state if you can take their nulls from them. The reason it was no good was because people don't think in trinary, it's either left or right, up or down, yes or no. Relational algebra NEEDS a trinary state. – jcolebrand Jan 16 '13 at 05:56
  • This appears to have changed in newer versions of C# https://stackoverflow.com/questions/59135545/how-to-initialize-non-nullable-strings-in-a-class – AriesConnolly Jun 21 '23 at 09:56

15 Answers15

334

Why is the default value of the string type null instead of an empty string?

Because string is a reference type and the default value for all reference types is null.

It's quite annoying to test all my strings for null before I can safely apply methods like ToUpper(), StartWith() etc...

That is consistent with the behaviour of reference types. Before invoking their instance members, one should put a check in place for a null reference.

If the default value of string were the empty string, I would not have to test, and I would feel it to be more consistent with the other value types like int or double for example.

Assigning the default value to a specific reference type other than null would make it inconsistent.

Additionally Nullable<String> would make sense.

Nullable<T> works with the value types. Of note is the fact that Nullable was not introduced on the original .NET platform so there would have been a lot of broken code had they changed that rule.(Courtesy @jcolebrand)

Community
  • 1
  • 1
Habib
  • 219,104
  • 29
  • 407
  • 436
  • But string has special support in several areas (string literals) so it could have been implemented (easily). – H H Jan 15 '13 at 16:01
  • 10
    @HenkHolterman One could implement a whole ton of things, but why introduce such a glaring inconsistency? –  Jan 15 '13 at 16:20
  • 4
    @delnan - "why" was the question here. – H H Jan 15 '13 at 17:20
  • 8
    @HenkHolterman And "Consistency" is the rebuttal to your point "string could be treated unlike other reference types". –  Jan 15 '13 at 17:39
  • 6
    @delnan: Being working on a language that treats string as value types and working 2+ years on dotnet, I agree with Henk. I see it as a major **FLAW** on dotnet. – Fabricio Araujo Jan 15 '13 at 18:39
  • @jcolebrand Thanks for your initial edit, I would give it a 1+ if I could. I however compacted your edit a bit with a new edit. – Marcel Jan 15 '13 at 20:57
  • 1
    @delnan: One could create a value type which behaved essentially like `String`, except for (1) the value-type-ish behavior of having a usable default value, and (2) an unfortunate extra layer of boxing indirection any time it was cast to `Object`. Given that the heap representation of `string` is unique, having special treatment to avoid extra boxing wouldn't have been much of a stretch (actually, being able to specify non-default boxing behaviors would be a good thing for other types as well). – supercat Jan 15 '13 at 22:09
  • 1
    @Marcel that's fine. I wanted to make sure it was seen to be an addition, and since you were the primary interested party, I'm glad you're the one that made the edit to clean it up a bit. :D I don't need the +1s, just for SE to be a better resource in the future :D – jcolebrand Jan 15 '13 at 22:13
  • 1
    @supercat is the value treated differently because there was no Nullable at the beginning or because strings are 90% of why we use computers in the modern age? – jcolebrand Jan 15 '13 at 22:15
  • 1
    @jcolebrand: I wouldn't say 90%. Graphics and audio processing, both of which are primarily numeric, account for a pretty hefty chunk. Most of the situations which would benefit from `String` being nullable would actually benefit more from being able to use the same logic to handle maybe-valid strings and maybe-valid numeric types, than from having `string` behave an a reference type which must be handled differently. – supercat Jan 15 '13 at 22:40
  • You, sir, have never worked with a number of my previous employers ... ;-) – jcolebrand Jan 15 '13 at 23:52
41

Habib is right -- because string is a reference type.

But more importantly, you don't have to check for null each time you use it. You probably should throw a ArgumentNullException if someone passes your function a null reference, though.

Here's the thing -- the framework would throw a NullReferenceException for you anyway if you tried to call .ToUpper() on a string. Remember that this case still can happen even if you test your arguments for null since any property or method on the objects passed to your function as parameters may evaluate to null.

That being said, checking for empty strings or nulls is a common thing to do, so they provide String.IsNullOrEmpty() and String.IsNullOrWhiteSpace() for just this purpose.

Dave Markle
  • 95,573
  • 20
  • 147
  • 170
  • 32
    You should never throw a `NullReferenceException` yourself (http://msdn.microsoft.com/en-us/library/ms173163.aspx); you throw an `ArgumentNullException` if your method can't accept null refs. Also, NullRef's are typically one of the more difficult exceptions to diagnos when you're fixing issues, so I don't think the recommendation to not check for null is a very good one. – Andy Jan 15 '13 at 15:22
  • 4
    @Andy "NullRef's are typically one of the most difficult exceptions to diagnose" I strongly disagree, if you log stuff it's really easy to find & fix (just handle the null case). – Louis Kottmann Jan 15 '13 at 16:03
  • 6
    Throwing `ArgumentNullException` has the additional benefit of being able to provide the parameter name. During debugging, this saves... err, seconds. But important seconds. – Kos Jan 15 '13 at 16:43
  • Of course, if one could specify that certain instance methods should be called directly without regard for whether they are invoked on null references (as happens with extension efforts), the horribly ugly syntax `String.IsNullOrEmpty(myString)` could be replaced with `myString.IsNullOrEmpty`. – supercat Jan 15 '13 at 22:11
  • Not sure what "ugly" means here, but if it means "consistent with everything else in the language and not hard to understand", then I guess it's ugly. – Dave Markle Jan 17 '13 at 11:47
  • 2
    @DaveMarkle you may want to include IsNullOrWhitespace too http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspx – Nathan Koop Jan 17 '13 at 19:31
  • 1
    I really think checking for null everywhere is a source of immense code bloat. it's ugly, and it looks hacky and it's hard to stay consistent. I think (at least in C#-like languages) a good rule is "ban the null keyword in production code, use it like crazy in test code". – sara Mar 27 '16 at 19:06
23

You could write an extension method (for what it's worth):

public static string EmptyNull(this string str)
{
    return str ?? "";
}

Now this works safely:

string str = null;
string upper = str.EmptyNull().ToUpper();
user2711965
  • 1,795
  • 2
  • 14
  • 34
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • 111
    **But please don't.** The last thing another programmer wants to see is thousands of lines of code peppered with .EmptyNull() everywhere just because the first guy was "scared" of exceptions. – Dave Markle Jan 15 '13 at 12:29
  • 17
    @DaveMarkle: But obviously it's exactly what OP was looking for. _"It's quite annoying to test all my strings for null before I can safely apply methods like ToUpper(), StartWith() etc"_ – Tim Schmelter Jan 15 '13 at 12:30
  • 20
    The comment was to the OP, not to you. While your answer is clearly correct, a programmer asking a basic question such as this should be strongly cautioned against actually putting your solution into WIDE practice, as often is their wont. There are a number of tradeoffs you don't discuss in your answer, such as opaqueness, increased complexity, difficulty of refactoring, potential overuse of extension methods, and yes, performance. Sometimes (many times) a correct answer is not the right path, and this is why I commented. – Dave Markle Jan 15 '13 at 12:48
  • 2
    Many people seem to have your opinion. Note that i've not encouraged to replace all occurences of `string` with `EmptyNull`. It's just a direct answer to OP's requirement. Many programmers know what they are doing or are working on their own(as me). Btw, here i've found a question which targets this issue: stackoverflow.com/questions/8536740/… – Tim Schmelter Jan 15 '13 at 15:23
  • 1
    @DaveMarkle The last thing another programmer wants to deal with are NullRefExceptions everywhere because proper null checking wasn't done. – Andy Jan 15 '13 at 15:24
  • 5
    @Andy: The solution to not having proper null checking done is to properly check for nulls, not to put a band-aid on a problem. – Dave Markle Jan 15 '13 at 16:49
  • 7
    If you're going through the trouble of writing `.EmptyNull()`, why not simply use `(str ?? "")` instead where it is needed? That said, I agree with the sentiment expressed in @DaveMarkle's comment: you probably shouldn't. `null` and `String.Empty` are conceptually different, and you can't necessarily treat one the same as another. – user Jan 15 '13 at 17:27
  • 3
    Sometimes it's nice to have clean looking Extension methods like this, not having to slap value ?? "" everywhere. – Patrick Magee Apr 05 '13 at 00:17
  • 3
    Of course you could go even further in this (bad?) direction with `public static string ToUpperSafe(this string str) { return str == null ? null : str.ToUpper(); }` and so on... – Jeppe Stig Nielsen Jul 23 '13 at 19:27
18

You could also use the following, as of C# 6.0

string myString = null;
string result = myString?.ToUpper();

The string result will be null.

pensono
  • 336
  • 6
  • 17
russelrillema
  • 452
  • 7
  • 14
14

Empty strings and nulls are fundamentally different. A null is an absence of a value and an empty string is a value that is empty.

The programming language making assumptions about the "value" of a variable, in this case an empty string, will be as good as initiazing the string with any other value that will not cause a null reference problem.

Also, if you pass the handle to that string variable to other parts of the application, then that code will have no ways of validating whether you have intentionally passed a blank value or you have forgotten to populate the value of that variable.

Another occasion where this would be a problem is when the string is a return value from some function. Since string is a reference type and can technically have a value as null and empty both, therefore the function can also technically return a null or empty (there is nothing to stop it from doing so). Now, since there are 2 notions of the "absence of a value", i.e an empty string and a null, all the code that consumes this function will have to do 2 checks. One for empty and the other for null.

In short, its always good to have only 1 representation for a single state. For a broader discussion on empty and nulls, see the links below.

https://softwareengineering.stackexchange.com/questions/32578/sql-empty-string-vs-null-value

NULL vs Empty when dealing with user input

Community
  • 1
  • 1
Abbas Gadhia
  • 14,532
  • 10
  • 61
  • 73
  • 2
    And how exactly do you see this difference, say in a text box? Did the user forget to enter a value in the field, or are they purposefully leaving it blank? Null in a programming language does have a specific meaning; unassigned. We know it doesn't have a value, which is not the same as a database null. – Andy Jan 15 '13 at 15:43
  • 1
    theres not much difference when you use it with a text box. Either ways, having one notation to represent the absence of a value in a string is paramount. If i had to pick one, i'd pick null. – Abbas Gadhia Jan 15 '13 at 16:28
  • In Delphi, string is a value type and therefore can't be null. It makes life a lot easier in this respect - I really find very annoying make string an reference type. – Fabricio Araujo Jan 15 '13 at 17:38
  • 1
    Under the COM (Common Object Model) which predated .net, a string type would either hold a pointer to the string's data, or `null` to represent the empty string. There are a number of ways .net could have implemented similar semantics, had they chosen to do so, especially given that `String` has a number of characteristics that make it a unique type anyway [e.g. it and the two array types are the only types whose allocation size isn't constant]. – supercat Jan 15 '13 at 21:41
8

The fundamental reason/problem is that the designers of the CLS specification (which defines how languages interact with .net) did not define a means by which class members could specify that they must be called directly, rather than via callvirt, without the caller performing a null-reference check; nor did it provide a meany of defining structures which would not be subject to "normal" boxing.

Had the CLS specification defined such a means, then it would be possible for .net to consistently follow the lead established by the Common Object Model (COM), under which a null string reference was considered semantically equivalent to an empty string, and for other user-defined immutable class types which are supposed to have value semantics to likewise define default values. Essentially, what would happen would be for each member of String, e.g. Length to be written as something like [InvokableOnNull()] int String Length { get { if (this==null) return 0; else return _Length;} }. This approach would have offered very nice semantics for things which should behave like values, but because of implementation issues need to be stored on the heap. The biggest difficulty with this approach is that the semantics of conversion between such types and Object could get a little murky.

An alternative approach would have been to allow the definition of special structure types which did not inherit from Object but instead had custom boxing and unboxing operations (which would convert to/from some other class type). Under such an approach, there would be a class type NullableString which behaves as string does now, and a custom-boxed struct type String, which would hold a single private field Value of type String. Attempting to convert a String to NullableString or Object would return Value if non-null, or String.Empty if null. Attempting to cast to String, a non-null reference to a NullableString instance would store the reference in Value (perhaps storing null if the length was zero); casting any other reference would throw an exception.

Even though strings have to be stored on the heap, there is conceptually no reason why they shouldn't behave like value types that have a non-null default value. Having them be stored as a "normal" structure which held a reference would have been efficient for code that used them as type "string", but would have added an extra layer of indirection and inefficiency when casting to "object". While I don't foresee .net adding either of the above features at this late date, perhaps designers of future frameworks might consider including them.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 1
    Speaking as someone who works in SQL a lot, and has dealt with the headache of Oracle not making a distinction between NULL and zero-length, I am very glad that .NET *does*. "Empty" is a value, "null" is not. –  Jan 15 '13 at 18:32
  • @JonofAllTrades: I disagree. On application code, except dealing with db code, there's no meaning an string being treated as a class. It's a value type and a basic one. Supercat: +1 to you – Fabricio Araujo Jan 15 '13 at 18:37
  • 1
    Database code is a big "except". As long as there are *some* problem domains where you need to distinguish between "present/known, an empty string" and "not present/unknown/inapplicable", such as databases, then the language needs to support it. Of course now that .NET has `Nullable<>`, strings could be reimplemented as value types; I can't speak to the costs and benefits of such a choice. –  Jan 15 '13 at 18:42
  • 3
    @JonofAllTrades: Code that deals with numbers has to have an out-of-band means of distinguishing the default value zero from "undefined". As it is, nullable-handling code that works with strings and numbers has to use one method for nullable strings and another for nullable numbers. Even if a nullable class type `string` is more efficient than `Nullable` would be, having to use the "more efficient" method is more burdensome than being able to use the same approach for all nullable data database values. – supercat Jan 15 '13 at 21:34
7

Why the designers of C# chose to use null as the default value of strings?

Because strings are reference types, reference types are default value is null. Variables of reference types store references to the actual data.

Let's use default keyword for this case;

string str = default(string); 

str is a string, so it is a reference type, so default value is null.

int str = (default)(int);

str is an int, so it is a value type, so default value is zero.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
5

Because a string variable is a reference, not an instance.

Initializing it to Empty by default would have been possible but it would have introduced a lot of inconsistencies all over the board.

H H
  • 263,252
  • 30
  • 330
  • 514
  • 3
    There's no particular reason `string` would have to be a reference type. To be sure, the actual characters that make up the string certainly have to be stored on the heap, but given the amount of dedicated support that strings have in the CLR already, it would not be a stretch to have `System.String` be a value type with a single private field `Value` of type `HeapString`. That field would be a reference type, and would default to `null`, but a `String` struct whose `Value` field was null would behave as an empty string. The only disadvantage of this approach would be... – supercat Jan 17 '13 at 02:54
  • 1
    ...that casting a `String` to `Object` would, in the absence of special-case code in the runtime, cause the creation of a boxed `String` instance on the heap, rather than simply copying a reference to the `HeapString`. – supercat Jan 17 '13 at 02:58
  • 1
    @supercat - nobody is saying that string should/could be a value type. – H H Jan 17 '13 at 08:07
  • 1
    Nobody except me. Having string be a "special" value type (with a private reference-type field) would allow most handling to be essentially as efficient as it is now, except for an added null check on methods/properties like `.Length` etc. so that instances which hold a null reference would not attempt to dereference it but instead behave as appropriate for an empty string. Whether the Framework would be better or worse with `string` implemented that way, if one wanted `default(string)` to be an empty string... – supercat Jan 17 '13 at 15:27
  • 1
    ...having `string` be a value-type wrapper on a reference-type field would be the approach that required the fewest changes to other parts of .net [indeed, if one were willing to accept have conversion from `String` to `Object` create an extra boxed item, one could simply have `String` be an ordinary struct with a field of type `Char[]` which it never exposed]. I think having a `HeapString` type would probably be better, but in some ways the value-type string holding a `Char[]` would be simpler. – supercat Jan 17 '13 at 15:34
  • You know, when 1 comment isn't enough, you probably shouldn't post as a comment. The lack of formatting increases the TL;DR factor. – H H Jan 17 '13 at 20:08
5

Perhaps if you'd use ?? operator when assigning your string variable, it might help you.

string str = SomeMethodThatReturnsaString() ?? "";
// if SomeMethodThatReturnsaString() returns a null value, "" is assigned to str.
Amen Jlili
  • 1,884
  • 4
  • 28
  • 51
4

If the default value of string were the empty string, I would not have to test

Wrong! Changing the default value doesn't change the fact that it's a reference type and someone can still explicitly set the reference to be null.

Additionally Nullable<String> would make sense.

True point. It would make more sense to not allow null for any reference types, instead requiring Nullable<TheRefType> for that feature.

So why did the designers of C# choose to use null as the default value of strings?

Consistency with other reference types. Now, why allow null in reference types at all? Probably so that it feels like C, even though this is a questionable design decision in a language that also provides Nullable.

Dan Burton
  • 53,238
  • 27
  • 117
  • 198
  • 4
    It could be because Nullable was only introduced in the .NET 2.0 Framework, so before then it wasn't available? – jcolebrand Jan 15 '13 at 19:36
  • 3
    Thanks Dan Burton for pointing out that someone CAN set the initialized value to null on reference types later on. Thinking this through tells me that my original intent in the question leads to no use. – Marcel Jan 15 '13 at 21:10
2

A String is an immutable object which means when given a value, the old value doesn't get wiped out of memory, but remains in the old location, and the new value is put in a new location. So if the default value of String a was String.Empty, it would waste the String.Empty block in memory when it was given its first value.

Although it seems minuscule, it could turn into a problem when initializing a large array of strings with default values of String.Empty. Of course, you could always use the mutable StringBuilder class if this was going to be a problem.

djv
  • 15,168
  • 7
  • 48
  • 72
  • Thanks for mentioning the "first initialisation" thing. – Marcel Jan 15 '13 at 21:00
  • 3
    How would it be a problem when initializing a large array? Since, as you said, Strings are immutable, all elements of the array would simply be pointers to the same `String.Empty`. Am I mistaken? – Dan Burton Jan 15 '13 at 21:09
  • 2
    The default value for *any* type is going to have all bits set to zero. The only way for the default value of `string` to be an empty string is to allow all-bits-zero as a representation of an empty string. There are a number of ways this could be accomplished, but I don't think any involve initializing references to `String.Empty`. – supercat Jan 15 '13 at 22:15
  • Other answers discussed this point as well. I think people have concluded that it wouldn't make sense to treat the String class as a special case and provide something other than all-bits-zero as an initialization, even if it was something like `String.Empty` or `""`. – djv Jan 15 '13 at 22:56
  • @DanV: Changing the initialization behavior of `string` storage locations would have required also changing the initialization behavior of all structs or classes which have fields of type `string`. That would represent a pretty big change in the design of .net, which presently expects to zero-initialize any type without even having to think about what it is, save only for its total size. – supercat Jan 17 '13 at 02:50
2

Since string is a reference type and the default value for reference type is null.

Akshay
  • 31
  • 6
1

Since you mentioned ToUpper(), and this usage is how I found this thread, I will share this shortcut (string ?? "").ToUpper():

    private string _city;
    public string City
    {
        get
        {
            return (this._city ?? "").ToUpper();
        }
        set
        {
            this._city = value;
        }
    }

Seems better than:

        if(null != this._city)
        { this._city = this._city.ToUpper(); }
Spencer Sullivan
  • 527
  • 6
  • 13
0

Maybe the string keyword confused you, as it looks exactly like any other value type declaration, but it is actually an alias to System.String as explained in this question.
Also the dark blue color in Visual Studio and the lowercase first letter may mislead into thinking it is a struct.

Community
  • 1
  • 1
Alessandro Da Rugna
  • 4,571
  • 20
  • 40
  • 64
  • 3
    Isn't the same true of the `object` keyword? Though admittedly, that's far less used than `string`. –  Jan 15 '13 at 15:39
  • 2
    As `int` is an alias for `System.Int32`. What's your point? :) – Thorarin Jan 15 '13 at 19:25
  • @Thorari @delnan : They're both aliases, but `System.Int32` is a `Struct` thus having a default value while `System.String` is a `Class` having a pointer with default value of `null`. They're visually presented in the same font/color. Without knowledge, one can think they act the same way (=having a default value). My answer was written with a http://en.wikipedia.org/wiki/Cognitive_psychology cognitive psychology idea behind it :-) – Alessandro Da Rugna Jan 16 '13 at 10:39
  • I am fairly certain Anders Hejlsberg that said it in a channel 9 interview. I know the difference between heap and stack but the idea with C# is that the casual programmer don't need to. – Thomas Koelle Feb 20 '15 at 08:41
0

Nullable types did not come in until 2.0.

If nullable types had been made in the beginning of the language then string would have been non-nullable and string? would have been nullable. But they could not do this du to backward compatibility.

A lot of people talk about ref-type or not ref type, but string is an out of the ordinary class and solutions would have been found to make it possible.

Thomas Koelle
  • 3,416
  • 2
  • 23
  • 44