17

I'm seriously doubting if the C# or .NET JIT compilers perform any useful optimizations, much less if they're actually competitive with the most basic ones in C++ compilers.

Consider this extremely simple program, which I conveniently made to be valid in both C++ and C#:

#if __cplusplus
#else
static class Program
{
#endif
    static void Rem()
    {
        for (int i = 0; i < 1 << 30; i++) ;
    }
#if __cplusplus
    int main()
#else
    static void Main()
#endif
    {
        for (int i = 0; i < 1 << 30; i++)
            Rem();
    }
#if __cplusplus
#else
}
#endif

When I compile and run it in the newest version of C# (VS 2013) in release mode, it doesn't terminate in any reasonable amount of time.

Edit: Here's another example:

static class Program
{
    private static void Test2() { }

    private static void Test1()
    {
#if TEST
        Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2();
        Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2();
        Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2();
        Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2();
        Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2();
        Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2();
        Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2(); Test2();
#else
        Test2();
#endif
    }

    static void Main()
    {
        for (int i = 0; i < 0x7FFFFFFF; i++)
            Test1();
    }
}

When I run this one, it takes a lot longer if TEST is defined, even though everything is a no-op and Test2 should be inlined.

Even the the most ancient C++ compilers I can get my hands on, however, optimize everything away, making the programs return immediately.

What prevents the .NET JIT optimizer from being able to make such simple optimizations? Why?

Community
  • 1
  • 1
user541686
  • 205,094
  • 128
  • 528
  • 886
  • 2
    The user is waiting while the JIT optimizer runs, so they typically do only optimizations that you can normally expect to have low cost and high pay-off. – Jerry Coffin Dec 05 '13 at 08:05
  • @JerryCoffin: ...optimizing out an empty loop is high cost and low payoff? o.O – user541686 Dec 05 '13 at 08:12
  • @Jerry Coffin - There's plenty of time at the C# -> MSIL compilation phase to perform optimisations such as dead code removal. – JoeG Dec 05 '13 at 08:14
  • With the prevalence of multicore, there isn't even a need to make the user wait. – Potatoswatter Dec 05 '13 at 08:15
  • 1
    Just a note, for anyone who thinks there's something special about having 2 loops that trips up the otherwise-present optimizations, you should just replace `int` with `long` (be careful about C++ though, it's `long long` there) and count up to a larger number (e.g. `1LL << 60`). There aren't **any** optimizations going on, this is by no means an "edge case" of any sort. – user541686 Dec 05 '13 at 08:19
  • 1
    @Mehrdad: No, but it's not an optimization you *normally expect* to have a big payoff. – Jerry Coffin Dec 05 '13 at 08:24
  • @JerryCoffin: ...er, isn't dead code elimination one of the biggest-payoff optimizations in C++? – user541686 Dec 05 '13 at 08:25
  • Silly question but what optimization flags did you use? For C#: By default, optimizations are disabled. Specify /optimize+ to enable optimizations – Glenn Teitelbaum Dec 05 '13 at 08:25
  • @GlennTeitelbaum: I used release mode, so whatever release mode uses by default. – user541686 Dec 05 '13 at 08:26
  • @Mehrdad: Not as a general rule, no. Peephole optimization tends to be the biggest payoff for the smallest investment. Next in line would probably be loop invariant code hoisting and common subexpression elimination. Dead code elimination *typically* just reduces executable size (since truly dead code is never executed at all). – Jerry Coffin Dec 05 '13 at 08:31
  • @GlennTeitelbaum the C# compiler flag `/optimize+` has basically no effect. It toggles optimizations for the *static* compiler, which only does a few *very* basic optimizations. Mostly, toggling optimizations there makes absolutely no difference – jalf Dec 05 '13 at 08:33
  • I guess the question is "why loop without body is not considered dead code in C#" - not that if function is empty/does not do anything useful it will be eliminated... But for some reason loop without side-effects is not converted to no-op... (Could be even because not many people actually expect it to happen)... – Alexei Levenkov Dec 05 '13 at 08:57
  • This is not dead code elimination. Also it's not a very important optimisation to do. (So the doubt about actually useful optimisations does not follow) – Cat Plus Plus Dec 05 '13 at 09:19
  • 1
    @CatPlusPlus: Wait what? Then what *is* dead code elimination? – user541686 Dec 05 '13 at 09:23
  • 3
    Elimination of code that doesn't execute. This loop executes, so it's live code by definition. – Cat Plus Plus Dec 05 '13 at 09:23
  • 5
    @CatPlusPlus: No, that's not the definition. You should read the first and second sentences of Wikipedia on dead code elimination. – user541686 Dec 05 '13 at 09:25
  • Related (if not duplicate): http://stackoverflow.com/questions/539047/does-net-jit-optimize-empty-loops-away and http://stackoverflow.com/questions/7288428/is-there-a-way-to-get-the-net-jit-or-c-sharp-compiler-to-optimize-away-empty-fo – Oak Dec 05 '13 at 09:27
  • Oh right, *Wikipedia*. Silly me. Regardless it's still not a very important optimisation to do. – Cat Plus Plus Dec 05 '13 at 09:28
  • @CatPlusPlus: It certainly is, you just made that up out of nowhere. If you don't believe me you should compile your performance-critical C++ programs without DCE and see if you notice a significant difference. – user541686 Dec 05 '13 at 09:34
  • iow relying on compiler to remove useless code that shouldn't be there in the first place is silly and terrible programming and doesn't say anything about quality of compilers that don't do that. – Cat Plus Plus Dec 05 '13 at 09:41
  • 7
    @CatPlusPlus: If you don't have a good answer you don't *have* to shoot down the question. I didn't either, that's why I asked the question. Maybe someone else will and we will both be able to learn. – user541686 Dec 05 '13 at 09:45
  • 3
    @CatPlusPlus gosh, if only compiler writers had had access to your wisdom. They could have saved a lot of time if they'd known that dead code elimination was pointless. Odd that they *continue* to spend time improving it though, don't you think? They must be really stoopid – jalf Dec 05 '13 at 09:52
  • 10 people found it useful. One person did not. – jalf Dec 05 '13 at 09:59
  • @JerryCoffin: Sorry I forgot to reply to your comment, you might have already seen this but if not: regarding when you said *"...since truly dead code is never executed at all"*, that's not true -- dead code can certainly be executed. It's merely code that doesn't affect the output of the program. If a piece of code only ever writes to dead variables, it's dead code, even if it executes. (See the Wikipedia article I referred to earlier in response to another comment.) – user541686 Dec 05 '13 at 10:58
  • 1
    The .net JITter sucks, but I never missed that particular optimization. Many CPU instructions have no C# equivalent (64 bit multiplications, integer rotations, endian conversion...), no SIMD, a poor inliner, no optimizations for delegate calls, no 64 bit multiplication and it doesn't optimize out most array bounds checks are all far more annoying IMO. Unlike dead code those can't easily be worked around and typically cost a factor 2-6 compared with optimized c. – CodesInChaos Jan 02 '14 at 23:38

1 Answers1

21

The .NET JIT is a poor compiler, this is true. Fortunately, a new JIT (RyuJIT) and an NGEN that seems to be based on the VC compiler are in the works (I believe this is what the Windows Phone cloud compiler uses).

Although it is a very simple compiler it does inline small functions and remove side-effect free loops to a certain extent. It is not good at all of this but it happens.

Before we go into the detailed findings, note that the x86 and x64 JIT's are different codebases, perform differently and have different bugs.


Test 1:

You ran the program in Release mode in 32 bit mode. I can reproduce your findings on .NET 4.5 with 32 bit mode. Yes, this is embarrassing.

In 64 bit mode though, Rem in the first example is inlined and the innermost of the two nested loops is removed:

enter image description here

I have marked the three loop instructions. The outer loop is still there. I don't think that ever matters in practice because you rarely have two nested dead loops.

Note, that the loop was unrolled 4 times, then the unrolled iterations were collapsed into a single iteration (unrolling produced i += 1; i+= 1; i+= 1; i+= 1; and that was collapsed to i += 4;). Granted, the entire loop could be optimized away, but the JIT did perform the things that matter most in practice: unrolling loops and simplifying code.

I also added the following to Main to make it easier to debug:

    Console.WriteLine(IntPtr.Size); //verify bitness
    Debugger.Break(); //attach debugger


Test 2:

I cannot fully reproduce your findings in either 32 bit or 64 bit mode. In all cases Test2 is inlined into Test1 making it a very simple function:

enter image description here

Main calls Test1 in a loop because Test1 was too big to inline (because the non-simplified size counts because methods are JIT'ed in isolation).

When you have only a single Test2 call in Test1 then both functions are small enough to be inlined. This enables the JIT for Main to discover that nothing is being done at all in that code.


Final answer: I hope I could shed some light on what is going on. In the process I did discover some important optimizations. The JIT is just not very thorough and complete. If the same optimizations were just performed in a second idential pass, a lot more could be simplified here. But most programs only need one pass through all the simplifiers. I agree with the choice the JIT team made here.

So why is the JIT so bad? One part is that it must be fast because JITing is latency-sensitive. Another part is that it is just a primitive JIT and needs more investment.

usr
  • 168,620
  • 35
  • 240
  • 369
  • 3
    +1 this is a damn good answer, I didn't realize that 64-bit might be different (in fact, I didn't realize I wasn't running the 64-bit version)! And I had no idea another JIT was in the works. Thanks a lot for posting this! – user541686 Dec 05 '13 at 12:04
  • 1
    "One part is that it must be fast because JITing is latency-sensitive", I think that trade-off is the key. If I profile my code and find that it's slow because it has a dead loop in it, I slap my forehead and say "Why did I leave that there?!", delete the dead loop, problem solved. If the JIT has to perform dead loop analysis every time it runs every app in the world, we all suffer a performance hit. Whereas if we had to do our own inlining by hand, that would undermine the value of most language/runtime features, hence the JIT should (and does) inline for us. – Daniel Earwicker Dec 05 '13 at 12:19
  • (cont) However, this says nothing of what the C# compiler should/could do to improve the situation. The trade-off there is the overhead of compilation times rather than runtime/startup, but surely some compiler flags to control optimisation strength would solve this? – Daniel Earwicker Dec 05 '13 at 12:20
  • 1
    It is my understanding that most dead code and simplification opportunities arise from other transformations like inlining. Inlining might propagate constants, causing certain code path to be statically known to be unreachable. That's were a lot of dead code comes from.; I also think that csc.exe could do a better job. What would be even better would be a precompilation model (like ngen) with the full Visual C compiler backend behind it. That would also enable auto-vectorization/SIMD. – usr Dec 05 '13 at 12:27
  • ^ Yes, dead code usually arises from other transformations, thank you for mentioning it. :) – user541686 Dec 05 '13 at 12:32
  • Unfortunly, ryujit is worse than "legacy" x64 jit – Alex Zhukovskiy Apr 03 '15 at 22:05
  • @AlexZhukovskiy this is true. I have run extensive code quality tests with RyuJIT in the meantime and quality has decreased. Sometimes significantly. The only really good thing that I found was loop cloning which removes range checks from 99.9% of all loops. – usr Apr 04 '15 at 10:12
  • @usr it's also more familiar with SIMD commands (for example `vsqrtsd` instead of `fsqrt`) but it's really doesn't matter cause of very poor performance – Alex Zhukovskiy Apr 05 '15 at 08:53