57

I want to know the process and internals of string interning specific to .NET Framework. Would also like to know the benefits of using interning and the scenarios/situations where we should use string interning to improve the performance. Though I have studied interning from the Jeffery Richter's CLR book but I am still confused and would like to know it in more detail.

[Editing] to ask a specific question with a sample code as below:

private void MethodA()
{
    string s = "String"; // line 1 - interned literal as explained in the answer        

    //s.intern(); // line 2 - what would happen in line 3 if we uncomment this line, will it make any difference?
}

private bool MethodB(string compareThis)
{
    if (compareThis == "String") // line 3 - will this line use interning (with and without uncommenting line 2 above)?
    {
        return true;
    }
    return false;
}
Guru Stron
  • 102,774
  • 10
  • 95
  • 132
S2S2
  • 8,322
  • 5
  • 37
  • 65

5 Answers5

45

In general, interning is something that just happens, automatically, when you use literal string values. Interning provides the benefit of only having one copy of the literal in memory, no matter how often it's used.

That being said, it's rare that there is a reason to intern your own strings that are generated at runtime, or ever even think about string interning for normal development.

There are potentially some benefits if you're going to be doing a lot of work with comparisons of potentially identical runtime generated strings (as interning can speed up comparisons via ReferenceEquals). However, this is a highly specialized usage, and would require a fair amount of profiling and testing, and wouldn't be an optimization I'd consider unless there was a measured problem in place.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • 1
    @Vijay: Calling intern on that string will have no effect - it's already an interned string (since it's assigned to a literal). The literal in MethodB will also be an interned string (all literal strings are interned automatically). – Reed Copsey Nov 09 '11 at 17:29
  • 1
    You have missed another important use case discussed in some other answers. If you are storing a truly gigantic amount of data that has many of the same string, there can be a large memory savings. This was a lifesaver for me when I needed to load and keep in memory very large (multiple gigabyte) data files containing many repeated strings. – user12861 Jan 07 '23 at 03:53
28

This is an "old" question, but I have a different angle on it.

If you're going to have a lot of long-lived strings from a small pool, interning can improve memory efficiency.

In my case, I was interning another type of object in a static dictionary because they were reused frequently, and this served as a fast cache before persisting them to disk.

Most of the fields in these objects are strings, and the pool of values is fairly small (much smaller than the number of instances, anyway).

If these were transient objects, it wouldn't matter because the string fields would be garbage collected often. But because references to them were being held, their memory usage started to accumulate (even when no new unique values were being added).

So interning the objects reduced the memory usage substantially, and so did interning their string values while they were being interned.

harpo
  • 41,820
  • 13
  • 96
  • 131
27

Interning is an internal implementation detail. Unlike boxing, I do not think there is any benefit in knowing more than what you have read in Richter's book.

Micro-optimisation benefits of interning strings manually are minimal hence is generally not recommended.

This probably describes it:

class Program
{
    const string SomeString = "Some String"; // gets interned

    static void Main(string[] args)
    {
        var s1 = SomeString; // use interned string
        var s2 = SomeString; // use interned string
        var s = "String";
        var s3 = "Some " + s; // no interning 

        Console.WriteLine(s1 == s2); // uses interning comparison
        Console.WriteLine(s1 == s3); // do NOT use interning comparison
    }
}
Corio
  • 395
  • 7
  • 20
Aliostad
  • 80,612
  • 21
  • 160
  • 208
  • 23
    Just FYI - Your "no interning" line is going to still use two interned strings to generate the non-interned string. Also, string's comparisons always use the same comparison (there is no "interning comparison" or "other comparison") - but there's a short circuit that detects if the members point to the same instance. – Reed Copsey Nov 08 '11 at 17:34
  • Yes, constants and literals get interned. Cheers – Aliostad Nov 08 '11 at 17:55
  • 1
    @Aliostad - So for understanding, after the 'no interning' line; if we want to intern the s3 variable we would need to use s3.intern() and then the s1 == s3 comparison would use interning comparison - right? – S2S2 Nov 09 '11 at 05:02
  • 16
    Being blind to implementation details is a bad thing. Consider that many people are currently using work-arounds due to the perceived lack of string interning. Knowing that it exists and where it can improve the performance of you code might actually allow you to remove 'micro-optimisations' which are already in place, ones which trade performance for readability. Edit: I suppose there are two schools of thought regarding implementation details but many would argue that a good programmer's knowledge goes as far down the stack as possible, and especially to the idiosyncrasies of the compiler – Sprague Mar 07 '13 at 08:24
  • if you put to the mix compilers from C# to other platforms/languages, it's better to now assume any internal behaviour – George Birbilis Jan 28 '16 at 18:37
20

Interned strings have the following characteristics:

  • Two interned strings that are identical will have the same address in memory.
  • Memory occupied by interned strings is not freed until your application terminates.
  • Interning a string involves calculating a hash and looking it up in a dictionary which consumes CPU cycles.
  • If multiple threads intern strings at the same time they will block each other because accesses to the dictionary of interned strings are serialized.

The consequences of these characteristics are:

  • You can test two interned strings for equality by just comparing the address pointer which is a lot faster than comparing each character in the string. This is especially true if the strings are very long and start with the same characters. You can compare interned strings with the Object.ReferenceEquals method, but it is safer to use the string == operator because it checks to see if the strings are interned first.

  • If you use the same string many times in your application, your application will only store one copy of the string in memory reducing the memory required to run your application.

  • If you intern many different strings this will allocate memory for those strings that will never be freed, and your application will consume ever increasing amounts of memory.

  • If you have a very large number of interned strings, string interning can become slow, and threads will block each other when accessing the interned string dictionary.

You should use string interning only if:

  1. The set of strings you are interning is fairly small.
  2. You compare these strings many times for each time that you intern them.
  3. You really care about minute performance optimizations.
  4. You don't have many threads aggressively interning strings.
nothrow
  • 15,882
  • 9
  • 57
  • 104
bikeman868
  • 2,236
  • 23
  • 30
16

Internalization of strings affects memory consumption.

For example if you read strings and keep them it in a list for caching; and the exact same string occurs 10 times, the string is actually stored only once in memory if string.Intern is used. If not, the string is stored 10 times.

In the example below, the string.Intern variant consumes about 44 MB and the without-version (uncommented) consumes 1195 MB.

static void Main(string[] args)
{
    var list = new List<string>();

    for (int i = 0; i < 5 * 1000 * 1000; i++)
    {
        var s = ReadFromDb();
        list.Add(string.Intern(s));
        //list.Add(s);
    }

    Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64 / 1024 / 1024 + " MB");
}

private static string ReadFromDb()
{
    return "abcdefghijklmnopqrstuvyxz0123456789abcdefghijklmnopqrstuvyxz0123456789abcdefghijklmnopqrstuvyxz0123456789" + 1;
}

Internalization also improves performance for equals-compare. The example below the intern version takes about 1 time units while the non-intern takes 7 time units.

static void Main(string[] args)
{
    var a = string.Intern(ReadFromDb());
    var b = string.Intern(ReadFromDb());
    //var a = ReadFromDb();
    //var b = ReadFromDb();

    int equals = 0;
    var stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < 250 * 1000 * 1000; i++)
    {
        if (a == b) equals++;
    }
    stopwatch.Stop();

    Console.WriteLine(stopwatch.Elapsed + ", equals: " + equals);
}
J. Andersen
  • 161
  • 1
  • 5
  • 1
    Why are not these strings interned by default by C# optimizer since they are the same? – Serg Mar 30 '19 at 07:14
  • 1
    Interned strings are kept in memory and is not freed until the process is terminated so they carry a cost. Intern only if you will be doing a lot of compares during a larger part of the process life time and only a few number of strings to keep the memory cost down. – J. Andersen Aug 15 '19 at 18:11
  • String literals are automatically interned by the compiler. Read my answer to understand why the optimizer does not automatically intern all strings – bikeman868 Nov 19 '20 at 04:43