99

I'm finding massive performance differences between similar code in C and C#.

The C code is:

#include <stdio.h>
#include <time.h>
#include <math.h>

main()
{
    int i;
    double root;
    
    clock_t start = clock();
    for (i = 0 ; i <= 100000000; i++){
        root = sqrt(i);
    }
    printf("Time elapsed: %f\n", ((double)clock() - start) / CLOCKS_PER_SEC);   

}

And the C# (console app) is:

using System;
using System.Collections.Generic;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            DateTime startTime = DateTime.Now;
            double root;
            for (int i = 0; i <= 100000000; i++)
            {
                root = Math.Sqrt(i);
            }
            TimeSpan runTime = DateTime.Now - startTime;
            Console.WriteLine("Time elapsed: " + Convert.ToString(runTime.TotalMilliseconds/1000));
        }
    }
}

With the above code, the C# completes in 0.328125 seconds (release version) and the C takes 11.14 seconds to run.

The C is being compiled to a Windows executable using mingw.

I've always been under the assumption that C/C++ were faster or at least comparable to C#.net. What exactly is causing the C code to run over 30 times slower?

EDIT: It does appear that the C# optimizer was removing the root as it wasn't being used. I changed the root assignment to root += and printed out the total at the end. I've also compiled the C using cl.exe with the /O2 flag set for max speed.

The results are now: 3.75 seconds for the C 2.61 seconds for the C#

The C is still taking longer, but this is acceptable.

Josh Correia
  • 3,807
  • 3
  • 33
  • 50
John
  • 5,672
  • 7
  • 34
  • 52
  • 21
    I would suggest you use a StopWatch instead of just a DateTime. – Alex Fort Mar 26 '09 at 16:25
  • 3
    Which compiler flags? Are both compiled with optimizations enabled? – jalf Mar 26 '09 at 16:45
  • 2
    What about when you use -ffast-math with the C++ compiler? – Dan McClain Mar 26 '09 at 17:09
  • 12
    What a fascinating question! – Robert S. Mar 26 '09 at 19:47
  • 4
    Maybe C sqrt function is not as good as this in C#. Then it wouldn't be an issue with C, but with library attached to it. Try some calculations without math functions. – klew Mar 26 '09 at 19:56
  • 2
    A while back, I wrote an interesting, though not comprehensive, comparison between Mono's implementation of the CLR and C, you may want to check that out: http://www.trausch.us/2008/10/09/sometimes-learning-happens-strangely/ – Michael Trausch Mar 27 '09 at 19:46
  • 2
    Can I ask what verion of the .net framework have you used? And if you compiled the test for x86 or x64? It seems there are some speed differences between some version of .net – Mack Mar 24 '10 at 00:47
  • It might have to do with a compiler optimization that the C# version is using but the C version is not. Do you have optimization disabled for both? – DShook Mar 26 '09 at 16:32
  • 1
    The article @MichaelTrausch linked to can now be found here: http://mike.trausch.us/blog/2008/10/09/sometimes-learning-happens-strangely/ – Daniel A.A. Pelsmaeker Feb 12 '13 at 14:23
  • 1
    I'd suggest using the fast inverse square root (http://en.wikipedia.org/wiki/Fast_inverse_square_root) in the C code, and then taking 1 / the result ;) – John Gowers Sep 02 '13 at 10:32
  • It's only testing the speed of Math.Sqrt vs the implementation that is used in C. Probably both suck and one will suck more than the other. By using exact the same code, C will always beat C#, it's just the nature of the beast. Try something simple for example, lets take game of life. With exactly the same code, going through one million generations and then printing the result to console, it would take C# about 14s. The exact same code (copy-paste), will take about 5s in C with -O3. That's a huge difference which could save lives in some cases. – Viezevingertjes Sep 15 '18 at 09:11
  • What a flawed benchmark! The root call is just thrown away by the compiler. – aleck Mar 14 '19 at 04:40

13 Answers13

172

You must be comparing debug builds. I just compiled your C code, and got

Time elapsed: 0.000000

If you don't enable optimizations, any benchmarking you do is completely worthless. (And if you do enable optimizations, the loop gets optimized away. So your benchmarking code is flawed too. You need to force it to run the loop, usually by summing up the result or similar, and printing it out at the end)

It seems that what you're measuring is basically "which compiler inserts the most debugging overhead". And turns out the answer is C. But that doesn't tell us which program is fastest. Because when you want speed, you enable optimizations.

By the way, you'll save yourself a lot of headaches in the long run if you abandon any notion of languages being "faster" than each others. C# no more has a speed than English does.

There are certain things in the C language that would be efficient even in a naive non-optimizing compiler, and there are others that relies heavily on a compiler to optimize everything away. And of course, the same goes for C# or any other language.

The execution speed is determined by:

  • the platform you're running on (OS, hardware, other software running on the system)
  • the compiler
  • your source code

A good C# compiler will yield efficient code. A bad C compiler will generate slow code. What about a C compiler which generated C# code, which you could then run through a C# compiler? How fast would that run? Languages don't have a speed. Your code does.

jalf
  • 243,077
  • 51
  • 345
  • 550
  • Lots more interesting reading here: http://blogs.msdn.com/ricom/archive/2005/05/10/416151.aspx – Daniel Earwicker Mar 26 '09 at 16:57
  • 22
    Good answer, but I disagree about language speed, at least in analogy: It's been found that Welsch is a slower language than most because of the high frequency of long vowels. Additionally, people remember words (and word lists) better if they are faster to say. http://web.missouri.edu/~cowann/docs/articles/before%201993/Cowan%20et%20al%20JML%201992%20verbal%20output%20time.pdf http://en.wikipedia.org/wiki/Vowel_length http://en.wikipedia.org/wiki/Welsh_language – exceptionerror Jun 12 '09 at 09:42
  • 1
    Doesn't that depend on what you're *saying* in Welsch though? I find it unlikely that *everything* is slower. – jalf Jun 12 '09 at 11:43
  • 5
    ++ Hey guys, don't get sidetracked here. If the same program runs faster in one language than another, it's because different assembly code is generated. In this particular example, 99% or more of the time will go into floating `i`, and `sqrt`, so that's what is being measured. – Mike Dunlavey Nov 27 '09 at 18:59
124

I'll keep it brief, it is already marked answered. C# has the great advantage of having a well defined floating point model. That just happens to match the native operation mode of the FPU and SSE instruction set on x86 and x64 processors. No coincidence there. The JITter compiles Math.Sqrt() to a few inline instructions.

Native C/C++ is saddled with years of backwards compatibility. The /fp:precise, /fp:fast and /fp:strict compile options are the most visible. Accordingly, it must call a CRT function that implements sqrt() and checks the selected floating point options to adjust the result. That's slow.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 5
    you do not need to call any crt function to fsqrt. Just --fast-math or do inline asm. Native will **always** be faster than "managed code" but you sometimes you will need to tell your compiler what to do. – user877329 Mar 17 '12 at 09:54
  • 71
    This is an odd conviction among C++ programmers, they seem to think that the machine code generated by C# is somehow different from the machine code generated by a native compiler. There's only one kind. No matter what gcc compiler switch you use or inline assembly you write, there's still only one FSQRT instruction. It isn't always faster because a native language generated it, the cpu doesn't care. – Hans Passant Mar 17 '12 at 10:09
  • 3
    But you forget the time in JIT. The problem is that the start-up time increases when the JIT does a better job. Java hotspot matches up with unoptimized C-code on my system. – user877329 Mar 17 '12 at 10:43
  • 18
    That's what pre-jitting with ngen.exe solves. We're talking about C#, not Java. – Hans Passant Mar 17 '12 at 14:21
  • 1
    CLR or Java. What's the difference? – user877329 Mar 17 '12 at 15:04
  • Just for clarity, unless a new .NET added SSE, most .NET versions don't SSE optimize. Besides the sample code above wouldn't require that anyways. –  Oct 19 '12 at 17:51
  • 10
    No, the x64 jitter uses SSE. Math.Sqrt() gets translated to the sqrtsd machine code instruction. – Hans Passant Oct 19 '12 at 18:37
  • 1
    @annoying_squid: See http://blogs.msdn.com/b/davidnotario/archive/2005/08/15/451845.aspx – BlueRaja - Danny Pflughoeft Dec 17 '12 at 00:44
  • 6
    While it's technically not a difference between languages, the .net JITter does rather limited optimizations compared to a typical C/C++ compiler. One of the biggest limitations is lack of SIMD support making the code often around 4x slower. Not exposing many intrinsics can be a big malus as well, but that depends a lot on what you're doing. – CodesInChaos Mar 16 '13 at 09:53
63

Since you never use 'root', the compiler may have been removing the call to optimize your method.

You could try to accumulate the square root values into an accumulator, print it out at the end of the method, and see what's going on.

Edit : see Jalf's answer below

Community
  • 1
  • 1
Brann
  • 31,689
  • 32
  • 113
  • 162
  • 1
    A little experimentation suggests this isn't the case. The code for the loop is generated, although perhaps the runtime is smart enough to skip it. Even accumulating, C# still beats the pants of C. – Dana Mar 26 '09 at 16:47
  • 3
    It seems the problem is on the other end. C# behaves reasonably in all cases. His C code is apparently compiled without optimizations – jalf Mar 26 '09 at 16:50
  • 2
    A lot of you are missing the point here. I've been reading many similar cases where c# outperforms c/c++ and always the rebuttal is to employ some expert level optimization. 99% of programmers don't have the knowledge to use such optimization techniques just to get their code to run slightly faster than the c# code. Use cases for for c/c++ are narrowing. –  Aug 23 '16 at 16:11
60

I'm a C++ and a C# developer. I've developed C# applications since the first beta of the .NET framework and I've had more than 20 years experience in developing C++ applications. Firstly, C# code will NEVER be faster than a C++ application, but I won't go through a lengthy discussion about managed code, how it works, the inter-op layer, memory management internals, the dynamic type system and the garbage collector. Nevertheless, let me continue by saying the the benchmarks listed here all produce INCORRECT results.

Let me explain: The first thing we need to consider is the JIT compiler for C# (.NET Framework 4). Now the JIT produces native code for the CPU using various optimization algorithms (which tend to be more aggressive than the default C++ optimizer that comes with Visual Studio) and the instruction set used by the .NET JIT compiler are a closer reflection of the actual CPU on the machine so certain substitutions in the machine code could be made to reduce clock cycles and improve the hit rate in the CPU pipeline cache and produce further hyper-threading optimizations such us instruction reordering and improvements relating to branch prediction.

What this means is that unless you compile your C++ application using the correct pararmeters for the RELEASE build (not the DEBUG build) then your C++ application may perform more slowly than the corresponding C# or .NET based application. When specifying the project properties on your C++ application, make sure you enable "full optimization" and "favour fast code". If you have a 64 bit machine, you MUST specify to generate x64 as the target platform, otherwise your code will be executed through a conversion sub-layer (WOW64) which will substantially reduce performance.

Once you perform the correct optimizations in the compiler, I get .72 seconds for the C++ application and 1.16 seconds for the C# application (both in release build). Since the C# application is very basic and allocates the memory used in the loop on the stack and not on the heap, it is actually performing a lot better than a real application involved in objects, heavy computations and with larger data-sets. So the figures provided are optimistic figures biased towards C# and the .NET framework. Even with this bias, the C++ application completes in just over half the time than the equivalent C# application. Keep in mind that the Microsoft C++ compiler I used did not have the right pipeline and hyperthreading optimizations (using WinDBG to view the assembly instructions).

Now if we use the Intel compiler (which by the way is an industry secret for generating high performance applications on AMD/Intel processors), the same code executes in .54 seconds for the C++ executable vs the .72 seconds using Microsoft Visual Studio 2010. So in the end, the final results are .54 seconds for C++ and 1.16 seconds for C#. So the code produce by the .NET JIT compiler takes 214% times longer than the C++ executable. Most of the time spent in the .54 seconds was in getting the time from the system and not within the loop itself!

What is also missing in the statistics is the startup and cleanup times which are not included in the timings. C# applications tend to spend a lot more time on start-up and on termination than C++ applications. The reason behind this is complicated and has to do with the .NET runtime code validation routines and the memory management subsystem which performs a lot of work at the beginning (and consequently, the end) of the program to optimize the memory allocations and the garbage collector.

When measuring the performance of C++ and .NET IL, it is important to look at the assembly code to make sure that ALL the calculations are there. What I found is that without putting some additional code in C#, most of the code in the examples above were actually removed from the binary. This was also the case with C++ when you used a more aggressive optimizer such as the one that comes with the Intel C++ compiler. The results I provided above are 100% correct and validated at the assembly level.

The main problem with a lot of forums on the internet that a lot of newbie's listen to Microsoft marketing propaganda without understanding the technology and make false claims that C# is faster than C++. The claim is that in theory, C# is faster than C++ because the JIT compiler can optimize the code for the CPU. The problem with this theory is that there is a lot of plumbing that exists in the .NET framework that slows the performance; plumbing which does not exist in C++ application. Furthermore, an experienced developer will know the right compiler to use for the given platform and use the appropriate flags when compiling the application. On the Linux or open source platforms, this is not a problem because you could distribute your source and create installation scripts that compile the code using the appropriate optimization. On the windows or closed source platform, you will have to distribute multiple executables, each with specific optimizations. The windows binaries that will be deployed are based on the CPU detected by the msi installer (using custom actions).

Richard
  • 857
  • 6
  • 2
  • 27
    1. Microsoft never made those claims about C# being faster their claims are its about 90% of the speed , faster to develop ( and hence more time to tune ) and more bug free due to memory and type safety . All of which are true ( i have 20 years in C++ and 10 in C#) 2. Startup performance is meaningless in most cases. 3. There are also faster C# compilers like LLVM ( so bringing out Intel is not Apples to Apples) – ben Jul 18 '10 at 03:02
  • 13
    Startup performance is not meaningless. It is very important in most enterprise web based application which is why Microsoft introduced web pages to be preloaded (autostart) in .NET 4.0. When the application pool is recycled every once in a while, the first time each page loads will add a significant delay for complex pages and cause time-outs on the browser. – Richard Jul 19 '10 at 09:15
  • 8
    Microsoft made the claims about the performance of .NET being faster in earlier marketing material. They also made various claims about the garbage collector had little or no impact on performance. Some of these claims made it into various books (on ASP.NET and .NET) in their earlier editions. Although Microsoft don't specifically say that your C# application will be faster than your C++ application, they do may sweeping generic comments and marketing slogans such as "Just-In-Time Means Run-It-Fast" (http://msdn.microsoft.com/en-us/library/ms973894.aspx). – Richard Jul 19 '10 at 09:48
  • 3
    Great post ! +1 -- The main problem with a lot of forums on the internet that a lot of newbie's listen to Microsoft marketing propaganda without understanding the technology – vicsz Feb 25 '11 at 19:52
  • 79
    -1, this rant is full of incorrect and misleading statements such as the obvious whopper "C# code will NEVER be faster than a C++ application" – BCoates Nov 06 '11 at 06:27
  • 36
    -1. You should read Rico Mariani vs Raymond Chen's C# vs C performance battle: http://blogs.msdn.com/b/ricom/archive/2005/05/16/418051.aspx. In short: it took one of the smartest guys in Microsoft a lot of optimizing to make the C version faster than a simple C# version. – Rolf Bjarne Kvinge Sep 04 '12 at 11:08
  • 2
    And you'd believe everything Microsoft feeds you on their blogs? – pipja Jun 16 '14 at 03:48
  • 1
    @BCoates, can you please explain better why Richard is wrong? For a programmer without experience in these things (like I am), the Richard's answer seems reasoned, documented and tested. – Massimiliano Kraus Sep 16 '16 at 07:52
10

my first guess is a compiler optimization because you never use root. You just assign it, then overwrite it again and again.

Edit: damn, beat by 9 seconds!

Neil N
  • 24,862
  • 16
  • 85
  • 145
  • 2
    I say you are correct. The actual variable is overwritten and never used beyond that. The csc would most likely just forgo the whole loop while the c++ compiler probably left it in. A more accurate test would be to accumulate the results and then print that result out at the end. Also one should not hard code the seed value in, but rather leave it to being user-defined. This would not give c# compiler any room to leave stuff out. –  Oct 19 '12 at 17:56
7

To see if the loop is being optimised away, try changing your code to

root += Math.Sqrt(i);

ans similarly in the C code, and then print the value of root outside the loop.

6

Maybe the c# compiler is noticing you don't use root anywhere, so it just skips the whole for loop. :)

That may not be the case, but I suspect whatever the cause is, it is compiler implementation dependent. Try compiling you C program with the Microsoft compiler (cl.exe, available as part of the win32 sdk) with optimizations and Release mode. I bet you'll see a perf improvement over the other compiler.

EDIT: I don't think the compiler can just optimize out the for loop, because it would have to know that Math.Sqrt() doesn't have any side-effects.

i_am_jorf
  • 53,608
  • 15
  • 131
  • 222
  • 2
    @Neil, @jeff: Agreed, it could know that pretty easily. Depending on the implementation, static analysis on Math.Sqrt() might not be that hard, although I'm not sure what optimizations are specifically performed. – John Feminella Mar 26 '09 at 16:43
6

I put together (based on your code) two more comparable tests in C and C#. These two write a smaller array using the modulus operator for indexing (it adds a little overhead, but hey, we're trying to compare performance [at a crude level]).

C code:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <math.h>

void main()
{
    int count = (int)1e8;
    int subcount = 1000;
    double* roots = (double*)malloc(sizeof(double) * subcount);
    clock_t start = clock();
    for (int i = 0 ; i < count; i++)
    {
        roots[i % subcount] = sqrt((double)i);
    }
    clock_t end = clock();
    double length = ((double)end - start) / CLOCKS_PER_SEC;
    printf("Time elapsed: %f\n", length);
}

In C#:

using System;

namespace CsPerfTest
{
    class Program
    {
        static void Main(string[] args)
        {
            int count = (int)1e8;
            int subcount = 1000;
            double[] roots = new double[subcount];
            DateTime startTime = DateTime.Now;
            for (int i = 0; i < count; i++)
            {
                roots[i % subcount] = Math.Sqrt(i);
            }
            TimeSpan runTime = DateTime.Now - startTime;
            Console.WriteLine("Time elapsed: " + Convert.ToString(runTime.TotalMilliseconds / 1000));
        }
    }
}

These tests write data to an array (so the .NET runtime shouldn't be allowed to cull the sqrt op) although the array is significantly smaller (didn't want to use excessive memory). I compiled these in release config and run them from inside a console window (instead of starting through VS).

On my computer the C# program varies between 6.2 and 6.9 seconds, while the C version varies between 6.9 and 7.1.

Cecil Has a Name
  • 4,962
  • 1
  • 29
  • 31
5

Whatever the time diff. may be, that "elapsed time" is invalid. It would only be a valid one if you can guarantee that both programs run under the exact same conditions.

Maybe you should try a win. equivalent to $/usr/bin/time my_cprog;/usr/bin/time my_csprog

Tom
  • 43,810
  • 29
  • 138
  • 169
  • 1
    Why is this downvoted? Is anyone assuming that interrupts and context switches don't affect performance? Can anyone make assumptions on TLB misses, page swapping, etc? – Tom Mar 27 '09 at 19:45
5

If you just single-step the code at the assembly level, including stepping through the square-root routine, you will probably get the answer to your question.

No need for educated guessing.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • I'd like to know how to do this – Josh Stodola Mar 26 '09 at 19:51
  • Depends on your IDE or debugger. Break at the start of the pgm. Display the disassembly window, and start single-stepping. If using GDB, there are commands for stepping one instruction at a time. – Mike Dunlavey Mar 27 '09 at 11:12
  • Now that is a good tip, this helps one understand much more what is actually going on down there. Does that also show JIT optimizations like inlining and tail calls? – gjvdkamp May 31 '11 at 07:39
  • FYI: for me this showed VC++ using fadd and fsqrt whereas C# used cvtsi2sd and sqrtsd which as I understand are SSE2 instructions and so considerably faster where supported. – danio Jan 10 '12 at 16:56
2

The other factor that may be an issue here is that the C compiler compiles to generic native code for the processor family you target, whereas the MSIL generated when you compiled the C# code is then JIT compiled to target the exact processor you have complete with any optimisations that may be possible. So the native code generated from the C# may be considerably faster than the C.

David M
  • 71,481
  • 13
  • 158
  • 186
  • In theory, yes. In practice, that virtually never makes a measurable difference. A percent or two, perhaps, if you're lucky. – jalf Mar 26 '09 at 16:48
  • or - if you have certain type of code that uses extensions that aren't in the allowed list for the 'generic' processor. Things like SSE flavours. Try with the processor target set higher, to see what differences you get. – gbjbaanb Mar 26 '09 at 19:55
1

It would seem to me that this is nothing to do with the languages themselves, rather it is to do with the different implementations of the square root function.

Jack Ryan
  • 8,396
  • 4
  • 37
  • 76
1

Actually guys, the loop is NOT being optimized away. I compiled John's code and examined the resulting .exe. The guts of the loop are as follows:

 IL_0005:  stloc.0
 IL_0006:  ldc.i4.0
 IL_0007:  stloc.1
 IL_0008:  br.s       IL_0016
 IL_000a:  ldloc.1
 IL_000b:  conv.r8
 IL_000c:  call       float64 [mscorlib]System.Math::Sqrt(float64)
 IL_0011:  pop
 IL_0012:  ldloc.1
 IL_0013:  ldc.i4.1
 IL_0014:  add
 IL_0015:  stloc.1
 IL_0016:  ldloc.1
 IL_0017:  ldc.i4     0x5f5e100
 IL_001c:  ble.s      IL_000a

Unless the runtime is smart enough to realize the loop does nothing and skips it?

Edit: Changing the C# to be:

 static void Main(string[] args)
 {
      DateTime startTime = DateTime.Now;
      double root = 0.0;
      for (int i = 0; i <= 100000000; i++)
      {
           root += Math.Sqrt(i);
      }
      System.Console.WriteLine(root);
      TimeSpan runTime = DateTime.Now - startTime;
      Console.WriteLine("Time elapsed: " +
          Convert.ToString(runTime.TotalMilliseconds / 1000));
 }

Results in the time elapsed (on my machine) going from 0.047 to 2.17. But is that just the overhead of adding a 100 million addition operators?

Dana
  • 32,083
  • 17
  • 62
  • 73
  • 3
    Looking at the IL doesn't tell you much about optimizations because although the C# compiler does some things like constant folding and removing dead code, the IL then takes over and does the rest at load time. – Daniel Earwicker Mar 26 '09 at 16:47
  • That's what I thought might be the case. Even forcing it to do work, though, it's still 9 seconds faster than the C version. (I wouldn't have expected that at all) – Dana Mar 26 '09 at 16:49