8

Consider the following code:

enum Test {
    OptionOne,
    OptionTwo
}

List<string> testValues = new List<string> { ... } // A huge collection of strings

foreach(var val in testValues)
{
    if(val == Test.OptionOne.ToString()) // **Here**
    {
        // Do something
    }
}

Will the compiler optimize the calls to Test.OptionOne.ToString() or will it call it for each item in the testValues collection?

Rat Salad
  • 873
  • 1
  • 7
  • 11
  • how can it optimize it? of cause, it will call for each item! – Backs Feb 02 '16 at 17:11
  • @Backs `Test.OptionOne.ToString()` is a fixed string that will never change. It could optimize it by generating a single const string and comparing against that instead of generating a new string each loop. – Scott Chamberlain Feb 02 '16 at 17:14
  • 2
    @Backs - that's not really helpful, after all that's the exact proposition that the poster is questioning here. All you've done is asserted one position without actually providing any evidence or rational. – antiduh Feb 02 '16 at 17:15
  • 1
    It will not optimize the call. You'll have to do it manually hoisting the call out of the loop yourself. – InBetween Feb 02 '16 at 17:29
  • Optimizations like these is the job of the jitter, not the C# compiler. ToString() is a virtual method, the jitter optimizer does not try to guess what specific method is going to run. Nor is it pretty, it has to find out if the [Flags] attribute is applied. The framework uses caching to amortize the considerable cost. – Hans Passant Feb 02 '16 at 17:43

2 Answers2

8

Your question is one of loop-invariants analysis - can the compiler detect some expression in a loop that does not depend on the state of the loop for its evaluation, and has no side effects?

There's good reason to hope that the compiler could satisfy both propositions - the compiler could be smart enough to know that calling ToString() on an enumeration doesn't change, ever; and that calling ToString() on an enumeration doesn't have any appreciable side effects.

There might be reasons why the compiler would actively decide not to hoist the invariant - perhaps invoking the function is somehow faster than storing an extra variable on the stack.

The question boils down to whether or not it does.

I compiled the following program using VS2012 targeting .Net 4.6, and compiled with optimizations enabled. It appears that the compiler did not choose to hoist the invariant out of the loop:

    public static void Main()
    {
        for( int i = 0; i < 10; i++ )
        {
            Console.Out.WriteLine( i );
            Console.Out.WriteLine( Test.Option1.ToString() );
        }
    }

    public enum Test
    {
        Option1,
        Option2,
        Option3
    }

Here's the raw IL from the program that I obtained using ILSpy 2.3.1. Note the call to ToString(), right in the middle of the loop.

.method public hidebysig static 
    void Main () cil managed 
{
    .custom instance void [mscorlib]System.STAThreadAttribute::.ctor() = (
        01 00 00 00
    )
    // Method begins at RVA 0x2050
    // Code size 46 (0x2e)
    .maxstack 2
    .entrypoint
    .locals init (
        [0] int32 i
    )

    IL_0000: ldc.i4.0
    IL_0001: stloc.0
    IL_0002: br.s IL_0028
    // loop start (head: IL_0028)
        IL_0004: call class [mscorlib]System.IO.TextWriter [mscorlib]System.Console::get_Out()
        IL_0009: ldloc.0
        IL_000a: callvirt instance void [mscorlib]System.IO.TextWriter::WriteLine(int32)
        IL_000f: call class [mscorlib]System.IO.TextWriter [mscorlib]System.Console::get_Out()
        IL_0014: ldc.i4.0
        IL_0015: box TestProject.Program/Test
--->    IL_001a: callvirt instance string [mscorlib]System.Object::ToString()
        IL_001f: callvirt instance void [mscorlib]System.IO.TextWriter::WriteLine(string)
        IL_0024: ldloc.0
        IL_0025: ldc.i4.1
        IL_0026: add
        IL_0027: stloc.0

        IL_0028: ldloc.0
        IL_0029: ldc.i4.s 10
        IL_002b: blt.s IL_0004
    // end loop

    IL_002d: ret
} // end of method Program::Main

I also got curious to see if the runtime JITer would hoist the invariant, but it too seems not to. I changed the code to the following, to make the assembly cleaner:

    public static void Main()
    {
        TextWriter cons = Console.Out;
        for( int i = 0; i < 10; i++ )
        {
            cons.WriteLine( i );
            cons.WriteLine( Test.Option1.ToString() );
        }
    }

And then used VS's debugger to get the assembly, being careful to make sure VS allowed the JITer to optimize. It still did not hoist the call to ToString():

            TextWriter cons = Console.Out;
00000000  push        rdi 
00000001  push        rsi 
00000002  sub         rsp,28h 
00000006  call        0000000050D76460 
0000000b  mov         rsi,rax 
            for( int i = 0; i < 10; i++ )
0000000e  xor         edi,edi 
            {
                cons.WriteLine( i );
00000010  mov         rcx,rsi 
00000013  mov         edx,edi 
00000015  mov         rax,qword ptr [rsi] 
00000018  mov         rax,qword ptr [rax+60h] 
0000001c  call        qword ptr [rax+28h] 
                cons.WriteLine( Test.Option1.ToString() );
0000001f  mov         rcx,7FE90116770h 
00000029  call        000000005F6302D0 
0000002e  mov         rcx,rsi 
00000031  xor         ecx,ecx 
00000033  mov         dword ptr [rax+8],ecx 
00000036  mov         rcx,rax 
00000039  mov         rax,qword ptr [rax] 
0000003c  mov         rax,qword ptr [rax+40h] 
00000040  call        qword ptr [rax]            <---- call System.Enum.ToString()
00000042  mov         rdx,rax 
00000045  mov         rcx,rsi 
00000048  mov         rax,qword ptr [rsi] 
0000004b  mov         rax,qword ptr [rax+68h] 
0000004f  call        qword ptr [rax+20h] 
            for( int i = 0; i < 10; i++ )
00000052  inc         edi 
00000054  cmp         edi,0Ah 
00000057  jl          0000000000000010 
00000059  add         rsp,28h 
            }
        }
antiduh
  • 11,853
  • 4
  • 43
  • 66
1

No, but you can greatly reduce the complexity by doing something like this:

using System.Linq;

var testValues = new List<string> { ... }; // A huge collection of strings
var testDict = testValue.ToDictionary(elem => elem, elem => true);

var needle = Test.OptionOne.ToString();
if (testDict.ContainsKey(needle))
{
   // do something
}

Dictionary value retrieval has a complexity of O(1).

I think an alternative is HashSet, as you have only keys. An interesting question and answers regarding building a HashSet from a list can be found here.

[edit] Following Scott's comment, I will include option using Hashset (also in O(1)):

var testHash = new HashSet<string>(testValues);
if (testHash.Contains(needle))
{
   // do something
}

Based on btlog's correct observation, example code will fail for duplicates. It can be circumvented by either:

  • build directly the HashSet when fetching data (avoid the list in the first place)
  • obtain a second list having distinct values by using Distinct

However, the second approach raises the complexity to O(N)

Community
  • 1
  • 1
Alexei - check Codidact
  • 22,016
  • 16
  • 145
  • 164
  • Also the reason there is no `.ToHashSet()` is you can just do `new HashSet(testValues)`. (I would have just gone with the hash set instead of the dictionary) – Scott Chamberlain Feb 02 '16 at 17:20
  • 1
    Both of these suffer from the an assumption that either there are no duplicates in the original list as the ToDictionary will throw an exception if that is the case, or that duplicates aren't important as HashSet cannot have duplicate values. – btlog Feb 02 '16 at 17:26
  • Building Dictionary is at best O(N) so for one searched word the complexity can not decrease. Actually it increases, so does the memory consumption. – Antonín Lejsek Feb 08 '16 at 05:32