Why does the C# compiler translate this != comparison as if it were a > comparison?

Question

I have by pure chance discovered that the C# compiler turns this method:

static bool IsNotNull(object obj)
{
    return obj != null;
}

…into this CIL:

.method private hidebysig static bool IsNotNull(object obj) cil managed
{
    ldarg.0   // obj
    ldnull
    cgt.un
    ret
}

…or, if you prefer looking at decompiled C# code:

static bool IsNotNull(object obj)
{
    return obj > null;   // (note: this is not a valid C# expression)
}

How come that the != gets translated as a ">"?

score 202 · Accepted Answer · edited Jun 20 '20 at 09:12

Short answer:

There is no "compare-not-equal" instruction in IL, so the C# != operator has no exact correspondence and cannot be translated literally.

There is however a "compare-equal" instruction (ceq, a direct correspondence to the == operator), so in the general case, x != y gets translated like its slightly longer equivalent (x == y) == false.

There is also a "compare-greater-than" instruction in IL (cgt) which allows the compiler to take certain shortcuts (i.e. generate shorter IL code), one being that inequality comparisons of objects against null, obj != null, get translated as if they were "obj > null".

Let's go into some more detail.

If there is no "compare-not-equal" instruction in IL, then how will the following method get translated by the compiler?

static bool IsNotEqual(int x, int y)
{
    return x != y;
}

As already said above, the compiler will turn the x != y into (x == y) == false:

.method private hidebysig static bool IsNotEqual(int32 x, int32 y) cil managed 
{
    ldarg.0   // x
    ldarg.1   // y
    ceq
    ldc.i4.0  // false
    ceq       // (note: two comparisons in total)
    ret
}

It turns out that the compiler does not always produce this fairly long-winded pattern. Let's see what happens when we replace y with the constant 0:

static bool IsNotZero(int x)
{
    return x != 0;
}

The IL produced is somewhat shorter than in the general case:

.method private hidebysig static bool IsNotZero(int32 x) cil managed 
{
    ldarg.0    // x
    ldc.i4.0   // 0
    cgt.un     // (note: just one comparison)
    ret
}

The compiler can take advantage of the fact that signed integers are stored in two's complement (where, if the resulting bit patterns are interpreted as unsigned integers — that's what the .un means — 0 has the smallest possible value), so it translates x == 0 as if it were unchecked((uint)x) > 0.

It turns out the compiler can do just the same for inequality checks against null:

static bool IsNotNull(object obj)
{
    return obj != null;
}

The compiler produces almost the same IL as for IsNotZero:

.method private hidebysig static bool IsNotNull(object obj) cil managed 
{
    ldarg.0
    ldnull   // (note: this is the only difference)
    cgt.un
    ret
}

Apparently, the compiler is allowed to assume that the bit pattern of the null reference is the smallest bit pattern possible for any object reference.

This shortcut is explicitly mentioned in the Common Language Infrastructure Annotated Standard (1st edition from Oct 2003) (on page 491, as a footnote of Table 6-4, "Binary Comparisons or Branch Operations"):

"cgt.un is allowed and verifiable on ObjectRefs (O). This is commonly used when comparing an ObjectRef with null (there is no "compare-not-equal" instruction, which would otherwise be a more obvious solution)."

Excellent answer, just one nit: two's complement is not relevant here. It only matters that signed integers are stored in such a way that non-negative values in `int`'s range have the same representation in `int` as they do in `uint`. That's a far weaker requirement than two's complement. — , Feb 28 '15 at 14:30
Unsigned types never have any negative numbers, so a comparison operation that compares to zero cannot treat any non-zero number as less than zero. All representations corresponding to the non-negative values of `int` have already been taken up by the same value in `uint`, so all representations corresponding to the negative values of `int` have to correspond to *some* value of `uint` greater than `0x7FFFFFFF`, but it doesn't really matter which value that is. (Actually, all that's really required is that zero is represented the same way in both `int` and `uint`.) — , Feb 28 '15 at 14:47
@hvd: Thanks for explaining. You are right, it is not two's complement that matters; it is the requirement [that you mentioned](http://stackoverflow.com/questions/28781839/why-does-the-c-sharp-compiler-translate-this-comparison-as-if-it-were-a-com#comment45842111_28781840) *and* the fact that `cgt.un` treats an `int` as an `uint` without changing the underlying bit pattern. (Imagine that `cgt.un` would first try to fix underflows by mapping all negative numbers to 0. In that case you obviously couldn't substitute `> 0` for `!= 0`.) — stakx - no longer contributing, Feb 28 '15 at 14:53
Heh, yeah, good point, that's indeed one requirement I forgot to mention. :) — , Feb 28 '15 at 14:55
I find it surprising that comparing an object reference to another one using `>` is verifiable IL. That way one could compare two non-null objects and get a boolean result (which is non-deterministic). That's not a memory-safety issue but it feels like unclean design that is not in the general spirit of safe managed code. This design leaks the fact that object references are implemented as pointers. Seems like a design flaw of the .NET CLI. — usr, Mar 01 '15 at 12:17
@usr: Absolutely! Section III.1.1.4 of the [CLI standard](http://bit.ly/1IesnAK) says that _"Object references (type O) are completely opaque"_ and that _"the only comparison operations permitted are equality and inequality…."_ Perhaps because object references are *not* defined in terms of memory addresses, the standard also takes care to conceptually keep the null reference apart from 0 (see e.g. the definitions of `ldnull`, `initobj`, and `newobj`). So the use of `cgt.un` to compare object references against the null reference appears to contradict section III.1.1.4 in more than one way. — stakx - no longer contributing, Mar 01 '15 at 14:39
An alternative to `ceq; ldc.i4.0; ceq;` for the general case of "not-equals" is `ceq; ldc.i4.1; xor;`. The latter may be more suitable in some cases; be sure to check the release+optimized output of your target JIT to see how inlining affects the native instruction stream. — Glenn Slayden, Dec 14 '18 at 01:49

Why does the C# compiler translate this != comparison as if it were a > comparison?

1 Answers1

Short answer:

Let's go into some more detail.