6

I've implemented a foreach-loop and a while-loop that should create pretty much the same IL code.

The IL code (generated with compiler version 12.0.40629 for C#5) indeed is almost identical (with the natural exception of some numbers and so), but decompiler were able to reproduce the initial code.

What's the key difference that allows a decompiler to tell that the former code block is a foreach-loop while the latter one represents a while-loop?

The decompiled code that I provide below is generated with the latest version (as of today) of ILSpy (2.3.1.1855), but I also used JustDecompile, .NET Reflector, and dotPeek — with no difference. I didn't configure anything, I just the tools as they are installed.

Original code:

using System;
using System.Collections.Generic;

namespace ForeachVersusWhile
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var x = new List<int> {1, 2};
            foreach (var item in x)
            {
                Console.WriteLine(item);
            }

            using (var enumerator = x.GetEnumerator())
            {
                while (enumerator.MoveNext())
                {
                    Console.WriteLine(enumerator.Current);
                }
            }
        }
    }
}

Decompiled code:

List<int> x = new List<int>
{
    1,
    2
};
foreach (int item in x)
{
    Console.WriteLine(item);
}
using (List<int>.Enumerator enumerator = x.GetEnumerator())
{
    while (enumerator.MoveNext())
    {
        Console.WriteLine(enumerator.Current);
    }
}

IL Code (loops only):

[...]
IL_0016: ldloc.0
IL_0017: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0> class [mscorlib]System.Collections.Generic.List`1<int32>::GetEnumerator()
IL_001c: stloc.s CS$5$0000
.try
{
    IL_001e: br.s IL_002e
    // loop start (head: IL_002e)
        IL_0020: ldloca.s CS$5$0000
        IL_0022: call instance !0 valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::get_Current()
        IL_0027: stloc.1
        IL_0028: ldloc.1
        IL_0029: call void [mscorlib]System.Console::WriteLine(int32)

        IL_002e: ldloca.s CS$5$0000
        IL_0030: call instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::MoveNext()
        IL_0035: brtrue.s IL_0020
    // end loop

    IL_0037: leave.s IL_0047
} // end .try
finally
{
    IL_0039: ldloca.s CS$5$0000
    IL_003b: constrained. valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>
    IL_0041: callvirt instance void [mscorlib]System.IDisposable::Dispose()
    IL_0046: endfinally
} // end handler

IL_0047: ldloc.0
IL_0048: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0> class [mscorlib]System.Collections.Generic.List`1<int32>::GetEnumerator()
IL_004d: stloc.2
.try
{
    IL_004e: br.s IL_005c
    // loop start (head: IL_005c)
        IL_0050: ldloca.s enumerator
        IL_0052: call instance !0 valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::get_Current()
        IL_0057: call void [mscorlib]System.Console::WriteLine(int32)

        IL_005c: ldloca.s enumerator
        IL_005e: call instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::MoveNext()
        IL_0063: brtrue.s IL_0050
    // end loop

    IL_0065: leave.s IL_0075
} // end .try
finally
{
    IL_0067: ldloca.s enumerator
    IL_0069: constrained. valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>
    IL_006f: callvirt instance void [mscorlib]System.IDisposable::Dispose()
    IL_0074: endfinally
} // end handler

Background to the question:

I've read an article where they took a look at what C# code gets compiled to. In the first step they looked at a simple example: the foreach-loop.

Backed up by MSDN, a foreach loop is supposed to "hide the complexity of the enumerators". IL code doesn't know anything of a foreach-loop. So, my understanding is that, under the hood, the IL code of a foreach-loop equals a while-loop using IEnumerator.MoveNext.

Because the IL code doesn't represent a foreach-loop, a decompiler can hardly tell that a foreach-loop was used. That rose a couple of questions where people wondered why they saw a while-loop when they decompiled their own code. Here's one example.

I wanted to see that myself and wrote a small program with a foreach-loop and compiled it. Then I used a Decompiler to see what the code looks like. I wasn't expecting a foreach-loop, but was surprised when I actually got one.

The pure IL code, naturally, contained calls of IEnumerator.MoveNext etc.

I suppose I'm doing something wrong and hence enabling tools to access more information and, in consequence, correctly telling that I were using a foreach-loop. So, why am I seeing a foreach-loop instead of a while-loop using IEnumerator.MoveNext?

Community
  • 1
  • 1
Em1
  • 1,077
  • 18
  • 38
  • 1
    Which source do you rely on which lets you assume that a while-loop is created instead? – MakePeaceGreatAgain Jan 08 '16 at 13:36
  • 2
    Presumably decompiling tools were not enough smart at the time when people saw the while loop and now they became smarter to recognize a foreach loop! Btw where are the sources that says they found while-loop instead. – Sriram Sakthivel Jan 08 '16 at 13:37
  • 2
    There's no such thing as a `while` loop in IL. There's just conditional branching... basically it's entirely feasible for two pieces of C# to produce the same IL. Note that in Reflector at least, you can tell it a level of "optimization" in terms of C# version - but I don't know whether that would affect `foreach`. – Jon Skeet Jan 08 '16 at 13:37
  • 4
    Because the tools are *designed* to try to give you back reasonable C# code? So they're taught to recognize compiler tricks and undo them? – Damien_The_Unbeliever Jan 08 '16 at 13:37
  • what is your problem with foreach? just leave it alone ;) – M.kazem Akhgary Jan 08 '16 at 13:41
  • @JonSkeet That's the point. If two different C# pieces produce the same IL, how can decompiler tell which one I was using. As foreach is enabled by implementing IEnumerable and returning an IEnumerator, how are decompiler able to tell that I used a foreach-loop and not a while-loop, `while(enumerator.MoveNext)` that is? – Em1 Jan 08 '16 at 13:56
  • 4
    @Em1: It can't. What makes you think it can? Can you produce two programs that generate the same IL but are decompiled back to their original forms? If so, that should be in the question. – Jon Skeet Jan 08 '16 at 13:57
  • @JonSkeet My English must be terrible. I'm actually saying the opposite to what you seem to understand. Yes, it can't, that's my point exactly. Foreach is not a construct supported by IL. So, I think my question ultimately is how's it possible that some people see there foreach getting a while loop while others don't. Why are decompiler sometimes not able to convert an originally foreach-loop back into a foreach-loop; and at other times they are. In my opinion is a IL construct either identifiable as foreach or not, but not sometimes yes and sometimes not. – Em1 Jan 08 '16 at 14:24
  • @Em1: And the answer is that it's likely that they're using different versions of the software, or decompiling subtly different code. We can't really tell anything concrete without details. – Jon Skeet Jan 08 '16 at 14:25
  • @JonSkeet I highly doubt that the software versions are the key to the answer. Anyway, I've provided the IL code for a foreach-loop as well as for a while-loop. I'm not too experienced with IL code but it looks quite the same to me (except for different numbers, of course). And though, the decompiler could tell a difference. – Em1 Jan 08 '16 at 14:47
  • 1
    @Em1: You don't think software like Reflector changes its heuristics over different versions? I'm pretty sure it does. Then there's C# compiler versions, compiler settings (debug vs not etc). And in your example, you removed the try/finally, which would be part of the heuristic the compiler uses... try putting the `var enumerator = x.GetEnumerator();` as a `using` statement instead, to better mimic a `foreach` loop. It's still very unclear what you're asking here, I'm afraid. – Jon Skeet Jan 08 '16 at 14:50
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/100161/discussion-between-em1-and-jon-skeet). – Em1 Jan 08 '16 at 14:53
  • Is there a PDB file that the decompiler knows about? – Eric Lippert Jan 08 '16 at 15:10
  • The most obvious difference to my eye between the two code fragments is that the first declares a loop variable and the second does not. – Eric Lippert Jan 08 '16 at 15:11
  • @EricLippert That was my first idea, and I deleted the pdb file and move the dll to my desktop. I hope this is safe enough ;) — I've noticed the difference "CS$5$0000" vs "enumerator" and an additional call of "stloc.1" and "ldloc.1". But this should always have been that way (at least .NET 2). – Em1 Jan 08 '16 at 15:11
  • This is considerably better, but turning it into a complete program (copy/paste/compile) and telling us your compiler settings would help a lot. – Jon Skeet Jan 08 '16 at 15:13
  • @JonSkeet Hahah... it is the complete program (do you really want to bother with a line like `public static void Main(string[] args)`) ;). Anyhow. Standard compiler setting, if this is an answer to that point. The only thing I did was installing .NET, VS and starting VS ;). Well, and it's a Release build. And .NET 4.5. Nothing spectacular. – Em1 Jan 08 '16 at 15:17
  • 1
    @Em1: Yes, I *do* want that, because it makes it easier for everyone to copy/paste/compile. I've managed to fool it from the command line, btw - `csc /o+ /debug- Test.cs` makes Reflector 9 show both as `while` loops, whereas with `csc /o- debug- Test.cs` we still get a `foreach` loop. Looking into the difference now... – Jon Skeet Jan 08 '16 at 15:22
  • @JonSkeet Fair enough. Updating original code. — I've never compiled from the command line. I don't know what o+ and o- makes a difference, but "o-" is the default setting, isn't it? – Em1 Jan 08 '16 at 15:28
  • @Em1: I believe it depends on the configuration. Anyway, see my answer :) – Jon Skeet Jan 08 '16 at 15:34

2 Answers2

10

Here's the code I compiled, which made it slightly easier to look at the differences:

using System;
using System.Collections.Generic;

class Test
{
    static void Main() {} // Just to make it simpler to compile

    public static void ForEach(List<int> x)
    {        
        foreach (var item in x)
        {
            Console.WriteLine(item);
        }
    }

    public static void While(List<int> x)
    {
        using (var enumerator = x.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                Console.WriteLine(enumerator.Current);
            }
        }
    }
}

I'm using Roslyn, via VS2015 update 1 - version 1.1.0.51109.

Compiling with csc /o- /debug- Test.cs

In this case, Reflector 9.0.1.318 can tell the difference... and so can I. The locals for the foreach loop are:

.locals init (valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32> V_0,
       int32 V_1)

But the locals for the while loop are:

.locals init (valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32> V_0,
       bool V_1)

In the while loop, there's a stloc.1/ldloc.1 pair with the result of MoveNext(), but not with the result of Current... whereas in the foreach it's the other way round.

Compiling with csc /o+ /debug- Test.cs

In this case, Reflector showed a while loop in both cases, and the IL really was identical. There's no stloc.1/ldloc.1 pair in either loop.

Your IL

Looking at the IL that your compilation has come up with - again, there's the stloc.1/ldloc.1 pair for the Current property in the foreach loop.

Hand-crafted IL

I took the IL from the "can't tell the difference version" and just changed the .locals part and added stloc.1/ldloc.1 into the mix, and bingo - Reflector thought it was a foreach loop again.

So basically, while I don't know about other decompilers, it looks like Reflector uses what you do with the Current call as a signal.

Validation

I changed the While method to:

public static void While(List<int> x)
{        
    using (var enumerator = x.GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            int item = enumerator.Current;
            Console.WriteLine(item);
        }
    }
}

Now even with csc /o- /debug+, Reflector thinks the while loop is actually a foreach loop.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • There's one thing bothering me. Namely, that the "stloc.1/ldloc1" pair is missing in the while-loop in my code, whereas you're saying that you have it in both code pieces (though in different positions). That doesn't make any sense, does it? — I will now have to try a couple of things, enabling optimization for one, and changing the while-loop the way you did, for another. – Em1 Jan 08 '16 at 15:44
  • @Em1: I can't claim to know exactly why your version doesn't have it, but then I don't know which exact version of the compiler you're using. (And compiling from the command line definitely makes it simpler to see which options you're using.) – Jon Skeet Jan 08 '16 at 15:47
  • Once I installed VS2015, I could get the same results as you show here. But "csc /o+" with my older version (12.0.40629 for C#5) wouldn't generate IL code that 'removed' the foreach. — It seems to me that the optimization to remove the int-variable in a foreach-loop was recently implemented, while removing the bool-variable in a while-loop has always been there. And decompiler apparently interpret the presence of a int-variable as indicator for a foreach-loop (as you showed in your last paragprah), and the absence of it as indicator for a while-loop. Thanks for your great help. – Em1 Jan 11 '16 at 13:20
1

Jon Skeet helped me a lot understanding the difference. He mentioned the key points, but in a 'bit more elaborated way', so for potential future readers I'd like put it in different words.

When non-optimized, a foreach-loop internally consists of (up to) three variables.

  • The enumerator, which is necessary for the iterating,
  • a bool variable for telling whether the call of MoveNext returned true or false,
  • and an int variable that stores the current value.
.locals init (
    [0] int32,
    [1] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator,
    [2] bool
)

Note that the bool variable is not generated by all compiler versions. The code potentially just consists of the enumerator and the int variable.

The while-loop, in contrast, doesn't has that int variable.

.locals init (
    [0] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator,
    [1] bool
)

This additional int variable is used by decompilers to represent that code with a foreach-loop. This can be validated, as shown by Jon Skeet, by adding that variable in the while-loop.

int item = enumerator.Current;

When decompiling the respective IL code, a decompiler shows a foreach-loop where a while-loop was actually used.

However, both the int and bool variable are not necessary. In the IL code you can see that both values are pulled from the stack into a variable and then right away pushed onto the stack again.

stloc.1
ldloc.1

When optimizing code, they can both be removed. So, when both variables are removed and the int variable is not present, decompilers represent the IL with a while-loop.

That being said, not all compiler versions remove the int variable. Older versions only removed the bool variable and, hence, decompiler can make a difference between the two loops.

Em1
  • 1,077
  • 18
  • 38