Strange IL code emitted by some compiler

Question

I've been looking at some old, (Reflector) decompiled source code that I dug up. The DLL was originally compiled from Visual Basic .NET source, using .NET 2.0 - apart from that I have no information about the compiler anymore.

At some point something strange happened. There was a branch in the code that wasn't followed, even though the condition should have holded. To be exact, this was the branch:

[...]
if (item.Found > 0)
{
    [...]

Now, the interesting part was that if item.Found was -1, the scope of the if statement was entered. The type of item.Found was int.

To figure out what was going on, I went looking in the IL code and found this:

ldloc.3 
ldfld int32 Info::Found
ldc.i4.0 
cgt.un
stloc.s flag3
ldloc.s flag3
brfalse.s L_0024

Obviously Reflector was wrong here. The correct decompiled code should have been:

if ((uint)item.Found > (uint)0) 
{ ... }

OK so far for context. Now for my question.

First off, I cannot imagine someone actually writing this code; IMO no-one with a sane mind makes the distinction between '-1' and '0' this way - which are the only two values that 'Found' can have.

So, that leaves me with the conclusion that the compiler does something I do not understand.

Why on earth / in what context would a compiler generate IL code like this? What's the benefit of this check (instead of ceq or bne_un - which is what I would have expected and is normally generated by C#)?
And related: what was the original source code most likely?

Are you sure that the type of item.Found was int and not uint in the original source? Just double checking, since it isn't clear for me whether you have access to the original source or not — Denis Yarkovoy, Jun 11 '15 at 07:41
Almost certainly the source code was the VB equivalent of `if (item.Found)` where `Found` was converted from Boolean. You are asking reflector to show VB code as if it were C# code which can of course cause weirdies. What do you get if you tell reflector to show it as VB code? (Remember that VB represents Boolean as 0 or -1 when converted to numeric types) — Matthew Watson, Jun 11 '15 at 07:41
@MatthewWatson Yes, I tried that - if I show it as VB.Net nothing changes. That said, although Reflector does a decent job with C#, it's not particularly good with these kinds of constructions. Still, what you're saying is interesting - why would you compile that as 'cgt.un'? — atlaste, Jun 11 '15 at 07:50
I'm not sure why it would compile it as cgt.un, but I feel that it might be related to VB's bool->int conversions. — Matthew Watson, Jun 11 '15 at 07:52
Why is `-1` being used as the alternative to `0` anyway? Is this something that came from a VB6 (or earlier) boolean? — Jon Hanna, Jun 11 '15 at 09:18

Hans Passant · Answer 1 · 2015-06-11T10:56:56.640

Looks quirky but this is related to previous versions of Visual Basic, the generation that ended with VB6. It had a very different Boolean type representation, a VARIANT_BOOL. This still is a factor in VB.NET due to its need to support legacy code.

The value representation for True was different, it was -1. False is 0 like it is in .NET.

While that looks like a very quirky choice as well, any other language uses 1 to represent True, there was a very good reason for it. It makes the distinction between the logical and the mathemetical And and Or operators disappear. Which is pretty nice, one more thing a programmer doesn't have to learn. That this is a learning obstacle is pretty evident from the kind of code most any C# programmer writes, they blindly apply && or || in their if() statements. Even when it is not a good idea to do so, these operators are expensive due to the required short-circuiting branch in the machine code. If the left operand is poorly predicted by the processor's branch prediction then you'll easily lose a bunch of cpu cycles due to the pipeline stall.

Nice but not without problems, And and Or always evaluate both left and right operands. And that has a knack for tripping exceptions, sometimes you really do need short-circuiting. VB.NET added the AndAlso and OrElse operators to fix that problem.

So cgt.un makes sense, that can handle both a .NET Boolean value and a legacy value. It doesn't care if the True value is -1 or 1. And does not care that the variable or expression is actually Boolean, permitted in VB.NET with Option Strict Off.

And for that matter, both `VARIANT_BOOL` and .NET `Boolean` should treat any value that isn't zero as true (though some parts of .NET make an assumption it will never happen to `Boolean`). `cgt.un` will catch all of those too. — Jon Hanna, Jun 11 '15 at 09:32
On consideration, I don't think whether this was `VARIANT_BOOL` in origin or not is relevant, all that's relevant is whether it cares about zero vs non-zero. — Jon Hanna, Jun 11 '15 at 10:50
@HansPassant As always, thanks Hans, much appreciated; there's a bit of VB.Net details here I didn't know about. PS (just some extra info for those interested): lazy evaluation can sometimes actually be faster than non-lazy evaluation in these cases, that is: if the jump can be predicted properly etc, see also: http://oai.cwi.nl/oai/asset/21351/21351B.pdf for experiments in a database engine. — atlaste, Jun 11 '15 at 11:10

score 3 · Answer 2 · answered Jun 11 '15 at 07:57

As an experiment I compiled this VB code:

Dim test As Boolean
test = True
Dim x As Integer
x = test
If x Then Console.WriteLine("True")

The IL for the release version of this is:

.custom instance void [mscorlib]System.STAThreadAttribute::.ctor()
.entrypoint
.maxstack 2
.locals init (
    [0] bool test,
    [1] int32 x)
L_0000: ldc.i4.1 
L_0001: stloc.0 
L_0002: ldloc.0 
L_0003: ldc.i4.0 
L_0004: cgt.un 
L_0006: neg 
L_0007: stloc.1 
L_0008: ldloc.1 
L_0009: ldc.i4.0 
L_000a: cgt.un 
L_000c: brfalse.s L_0018
L_000e: ldstr "True"
L_0013: call void [mscorlib]System.Console::WriteLine(string)
L_0018: ret

Note the use of cgt.un

Reflector's interpretation as C# is:

bool test = true;
int x = (int) -(test > false);
if (x > 0x0)
{
    Console.WriteLine("True");
}

And as VB:

Dim test As Boolean = True
Dim x As Integer = CInt(-(test > False))
If (x > &H0) Then
    Console.WriteLine("True")
End If

Therefore I conclude the generated code is related to the conversion of the VB Boolean to a numeric value.

*ough*. OK, this answers the second part of my question. Thanks for that! I'm just still very curious about the first part: why the compiler team decided it's a good idea to compile it like this. I mean: ceq_un and ceq feel like (a) a stronger constraint and (b) like they're meant for this. :-) — atlaste, Jun 11 '15 at 07:59
@atlaste I guess only the compiler team will know the answer to that one! — Matthew Watson, Jun 11 '15 at 07:59
Well, I'll offer one anyway ;) In the meantime, note that `ceq_un` doesn't exist; signed and unsigned equality are the same, so there's no need for `ceq_un`. — Jon Hanna, Jun 11 '15 at 10:11

Jon Hanna · Answer 3 · 2015-06-11T10:44:03.987

Let's first consider that there are as you say two possible values -1 and 0. There's a question of what should be done if 42 ends up in there; whether that is impossible (you are correct in your statement) or just about possible (the value acts like a variant_bool in which -1 is the normal true value, but all non-zero should be treated as true) it's worth considering either way. And it makes sense to treat 42 the same as we treat -1; that is, it make sense to treat all non-zero as the same.

And even if there is absolutely no other possible non-zero value than -1 it still generalises to "test is non-zero" which is a very common case elsewhere, so it still makes sense to consider this a "test is non-zero" case. This is especially so if the compiler doesn't know -1 is the only possible non-zero value (very likely).

Now there is the question of whether to branch directly on the value (with brfalse, brtrue etc.) or to do a boolean operation and then branch on the result. Generally both the C# and VB.NET compilers will produce a boolean value and then branch on that in a debug builds:

Simple Code:

public void TestBool(bool x)
{
  if(x)
    throw new ArgumentOutOfRangeException();
}

Debug CIL:

  nop
  ldarg.1
  ldc.i4.0
  ceq
  stloc.0
  ldloc.0
  brtrue.s NoError
  newobj instance void [mscorlib]System.ArgumentOutOfRangeException::.ctor()
  throw
NoError:
  ret

Release CIL:

  ldarg.1
  brfalse.s NoError
  newobj instance void [mscorlib]System.ArgumentOutOfRangeException::.ctor()
  throw
NoError:
  ret

The extra steps of essentially doing x == true before doing the branching aids debugging. Similar effects are sometimes seen in release code, though less often.

So, for this reason we have a comparison being done before the branch in your code, rather than just a branch.

Now there is another question, of whether we should test that the value is zero or test that the value is not zero; either is equivalent much as:

if(x)
  DoSomething();

And

if(!x)
{
}
else
  DoSomething();

Are equivalent.

For this reason ceq could have been used, with the branching subsequent being appropriate for the case where item.Found as 0. But it's if anything more sensible to use cne with the branching subsequent being appropriate for the case where item.Found is not 0.

But there's no such CIL instruction as cne, or anything which comparably tests if something is not equal. Generally to do "check not equal" we do a sequence ceq, ldc.i4.0, ceq; check two values are equal and then check that the result of that check is false.

Luckily in the common case that what we are checking something is not equal to is 0 we don't need cne because cgt.un is logically equivalent to a hypothetical cne in this case. This makes cgt.un the obvious choice when we want to test that something isn't zero.

And hence while IYO "no-one with a sane mind makes the distinction between '-1' and '0' this way" it's a very sane way indeed to test for non-zero generally. And indeed, cgt.un appears often as just such a non-zero test.

And related: what was the original source code most likely?

If item.Found Then
  'More stuff
End If

Which is equivalent to the C#

if(item.Found != 0)
{
  //More stuff
}

I see that @xanatos had the correct answer, if not the full justifying logic, and was then convinced that they didn't. — Jon Hanna, Jun 11 '15 at 10:47
Good explaination, thanks for that. Perhaps it's just my head that's doing the confusion: I've been reading a lot on SSA compilers lately, and one of the common steps is variable boundary determination. That's useful for a lot of different things like elimination of boundary checks, etc. Now, if you have an equality operation, it's pretty easy determining the source boundaries that matter, thereby reducing the complexity - if you have a construct like this it feels more difficult. After thinking about it some more (more precise: 0 is a const), I'm not sure if it matters at all. — atlaste, Jun 11 '15 at 11:19
Well, a few things to that are: 1. Is your definition of the boundary valid (is it 100%, provably and demonstrably clear that the value can only be 0 or -1)? 2. Does the VB.NET compiler do such optimisation (in general the VB.NET and C# compilers do very little optimisation, leaving most of that to the jitter, and this was from the version from 9 years ago). 3. Would the boundary check help. Eh, eliminating boundary checks let us reduce code, but what would we reduce? Indeed, this can be somewhere were we introduce unsigned comparisons as per http://stackoverflow.com/a/29348411/400547 — Jon Hanna, Jun 11 '15 at 11:49

Strange IL code emitted by some compiler

3 Answers3