37

I was asked this question in an interview: Is string a reference type or a value type.

I said its a reference type. Then he asked me why don't we use new operator while initializing the string ? I said because the c# language has a simpler syntax for creating a string and the compiler automatically converts the code into a call for the construcor of the System.String class.

Is this answer correct or not ?

teenup
  • 7,459
  • 13
  • 63
  • 122
  • 4
    [Mostly correct](http://msdn.microsoft.com/en-us/library/system.string_members.aspx), but string are a bit more complex, and are cached and shared in weird ways. Good luck. – Kobi Jul 25 '10 at 09:52

7 Answers7

33

Strings are immutable reference types. There's the ldstr IL instruction which allows pushing a new object reference to a string literal. So when you write:

string a = "abc";

The compiler tests if the "abc" literal has already been defined in the metadata and if not declare it. Then it translates this code into the following IL instruction:

ldstr "abc"

Which basically makes the a local variable point to the string literal defined in the metadata.

So I would say that your answer is not quite right as the compiler doesn't translate this into a call to a constructor.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
31

Not exactly the right answer. Strings are "special" reference types. They are immutable. You are right that compiler does something internally, but it is not the constructor call. It calls ldstr which pushes a new object reference to a string literal stored in the metadata.

Sample C# code :

class Program
{
    static void Main()
    {
        string str;
        string initStr = "test";
    }
}

and here is the IL code

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       8 (0x8)
  .maxstack  1
  .locals init ([0] string str,
           [1] string initStr)
  IL_0000:  nop
  IL_0001:  ldstr      "test"
  IL_0006:  stloc.1
  IL_0007:  ret
} // end of method Program::Main

You can see ldstr call above.

Even more due to immutability of the Strings it becomes possible to keep only distinct/unique strings. All strings are kept in the hash table where the key is the string value and the value is the reference to that string. Each time when we have a new string CLR checks is there already such a string in the hash table. If there is then no new memory is allocated and the reference is set to this existing string.

You can run this code to check :

class Program
{
    static void Main()
    {
        string someString = "abc";
        string otherString = "efg";

        // will retun false
        Console.WriteLine(Object.ReferenceEquals(someString, otherString));

        someString = "efg";

        // will return true
        Console.WriteLine(Object.ReferenceEquals(someString, otherString));
    }
}    
Incognito
  • 16,567
  • 9
  • 52
  • 74
  • 1
    Great! Thank you for the in-depth explanation, especially about internal hash table. I never thought about that. – NDeveloper Jul 25 '10 at 10:30
  • All strings are not kept in a hash table, that is only strings that are interned. String literals are interned, but any new strings that are created are not interned automatically. – Guffa Jun 10 '13 at 07:30
  • great explanation. I was just wondering what happens when we assign new value to string variable. Since I figured out string is a class. – ExpertLoser Sep 18 '17 at 20:59
14

Well, it's correct that the compiler has special syntax that simplifies the creation of strings.

The part about the compiler producing a call to the constructor is not really correct. String literals are created when the application starts, so where the string literal is used it's only an assignment of a reference to an already existing object.

If you assign a string literal in a loop:

string[] items = new string[10];
for (int i = 0; i < 10; i++) {
  items[i] = "test";
}

it will not create a new string object for each iteration, it will just copy the same reference into each item.

Two other noteworthy things about string literals is that the compiler doesn't create duplicates, and it automatically combines them if you concatenate them. If you use the same literal string more than once, it will use the same object:

string a = "test";
string b = "test";
string c = "te" + "st";

The variables a, b and c all point to the same object.

The string class also has constructors that you can use:

string[] items = new string[10];
for (int i = 0; i < 10; i++) {
  items[i] = new String('*', 42);
}

In this case you will actually get ten separate string objects.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
4

Nope. The compiler does not change the construction. What type should the constructor argument be? String? ;-)

String literals are constants without a name.

In addition, you can initialize any class with a string literal, if it supports an operator:

   public class UnitTest1 {
      class MyStringable {
         public static implicit operator MyStringable(string value) {
            return new MyStringable();
         }
      }

      [TestMethod]
      public void MyTestMethod() {
         MyStringable foo = "abc";
      }
   }


Edit To be more clear: As you asked, if string will be converted into any constructor call, let's have a look into the IL code.

Taken this test method:

   [TestClass]
   class MyClass {
      [TestMethod]
      public void MyTest() {
         string myString = "foo";
         if (myString == "bar")
            Console.WriteLine("w00t");
      }
   }

Creates the following IL code:

.method public hidebysig instance void MyTest() cil managed
{
    .custom instance void [Microsoft.VisualStudio.QualityTools.UnitTestFramework]Microsoft.VisualStudio.TestTools.UnitTesting.TestMethodAttribute::.ctor()
    .maxstack 2
    .locals init (
        [0] string myString,
        [1] bool CS$4$0000)
    L_0000: nop 
    L_0001: ldstr "foo"
    L_0006: stloc.0 
    L_0007: ldloc.0 
    L_0008: ldstr "bar"
    L_000d: call bool [mscorlib]System.String::op_Equality(string, string)
    L_0012: ldc.i4.0 
    L_0013: ceq 
    L_0015: stloc.1 
    L_0016: ldloc.1 
    L_0017: brtrue.s L_0024
    L_0019: ldstr "w00t"
    L_001e: call void [mscorlib]System.Console::WriteLine(string)
    L_0023: nop 
    L_0024: ret 
}

As you see, all string values (foo, bar and w00t) are still strings and do not call any hidden constructor.

Hope this is more explaining.

Florian Reischl
  • 3,788
  • 1
  • 24
  • 19
  • 5
    That is massively cool, but I don't quite see how it helps explaining strings. – Kobi Jul 25 '10 at 09:51
  • The first part tried to be the explanation. Strings are a built-in c# language feature. String literals are strings and will never be changed by the compiler into anything provided to a string constructor. Maybe the answer is too sloppy... Sorry! – Florian Reischl Jul 25 '10 at 09:55
  • That would need a specification of the constructor to be accurate. The C#specification simply states that two literals in the same assembly should point to the same instance. So it would be perfectly ok to rewrite the expression to use the String(char[]) constructor and caching that instance for later uses. Further for an instance to exist it needs to be created at some point :) – Rune FS Jul 25 '10 at 10:07
1

As everyone said, string is immutable, so there is no constructor call implicitly. I would like to add the following reference for you, which may clear the air a bit more :

String Immutability

MD Sayem Ahmed
  • 28,628
  • 27
  • 111
  • 178
0

But we can use new operator while to initialize the string

String str = new char[] {'s','t','r'};

Is this answer correct or not ?

Not, the string are cached and used, let say as is in the IL.

0

Here's my take, I'm not entirely sure, so take my answer with a grain of salt.

String literals in .NET are self-contained, its length or other data structure is internally included in literal value itself. So, unlike C, assigning string literal in .NET is just a matter of assigning the memory address of whole data structure of string. In C, we need to use new in string class as it need to allocate other data structures around null-terminated string, length for instance.

Michael Buen
  • 38,643
  • 9
  • 94
  • 118