Performance of Managed C++ Vs UnManaged/native C++

Question

I am writing a very high performance application that handles and processes hundreds of events every millisecond.

Is Unmanaged C++ faster than managed c++? and why?

Managed C++ deals with CLR instead of OS and CLR takes care of memory management, which simplifies the code and is probably also more efficient than code written by "a programmer" in unmanaged C++? or there is some other reason? When using managed, how can one then avoid dynamic memory allocation, which causes a performance hit, if it is all transparent to the programmer and handled by CLR?

So coming back to my question, Is managed C++ more efficient in terms of speed than unmanaged C++ and why?

@Jerry, true, but many who come more recently to C++/CLI call it managed C++ since it has C++ syntax and produces managed code. They don't know it was the name of something else. Notice the question says managed C++ not Managed C++. — Kate Gregory, Jun 13 '10 at 19:38

Jerry Coffin · Answer 1 · 2016-11-20T06:07:49.110

There is no one answer to this. As a really general rule, native code will usually be faster, but 1) that's not always the case, 2) sometimes the difference is too small to care about, and 3) how well the code is written will usually make more difference than managed vs. unmanaged.

Managed code runs in a virtual machine. Basically, you start with a compiler that produces byte codes as output, then feed that to the virtual machine. The virtual machine then re-compiles it to machine code and executes that. This can provide some real advantages under some circumstances. For one example, if you have a 64-bit processor running a 64-bit VM (pretty nearly a given any more) but and old program written before 64-bit processors were common, the VM will still compile that byte code to 64-bit machine code, which can give quite a substantial speed advantage for at least some code.

At the same time, it can also be a fairly noticeable disadvantage for some code. In particular, the compiler is running while the user waits. To accommodate that, the VM's compiler can't itself run very slowly. Although native code generators differ, there's a pretty fair chance that whatever native compiler you choose includes at least a few optimizations that were foregone in the VM's bytecode compiler to keep its resource usage reasonable.

The VM also uses a garbage collector. Garbage collectors have rather different characteristics from manually managing memory. With many manual managers, allocating memory is fairly expensive. Releasing memory is fairly cheap, but roughly linear on the number of items you release. Other manual managers roughly reverse that, doing extra work when freeing memory in order to make allocation faster. Either way, the cost structure is different from a typical collector.

With a garbage collector, allocating memory is typically very cheap. With a typical (copying) collector, the cost of releasing memory depends primarily upon the number of objects that have been allocated and are still (at least potentially) in use.

The allocations themselves also differ though. In native C++, you typically create most objects on the stack, where both allocating and releasing memory is extremely cheap. In managed code, you typically allocate a much larger percentage of memory dynamically, where it's garbage collected.

The translation to machine code take place only once. If I am executing the same line of code numerous times then I think this translation does not matter. Yes the garbage collector point you made is an important one and also the dynamic memory allocation. Is there a sample available where you preallocate a large chunk of memory to use in order to avoid dynamic malloc? IS it possible to do in C#? — bsobaid, Jun 10 '10 at 19:37
@Jerry Coffin You are describing the behavior of the Java Hotspot VM. It first (fast) compiles on the first request, than detects hot loops (its where the name comes from) and recompiles them with some not too bad optimizations. Unfortunately, the .NET CLR (4) does still not do so. — Haymo Kutschbach, Feb 23 '12 at 09:42
@bsobaid, yes one could put a large chunck of memory to avoid malloc, by using large global variables, however local function for small variables work usually faster. Though extreme situations require extreme solutions And you need to test those. its hard to tell unless if you tested it out — Peter, Mar 20 '17 at 21:54

score 2 · Answer 2 · answered Jun 10 '10 at 16:54

2

You can write slow code in any language; conversely, you can use decent algorithms that may well be fast is almost any language.

The common answer here would be to pick a language that you already know, use appropriate algorithms, then profile the heck out of it to determine the actual hot spots.

I am somewhat concerned about the hundreds of events every millisecond statement. That's an awful lot. Are you reasonably going to be able to do the processing you expect in any language?

As a C++ developer on high-performance systems, I tend to trust my ability to profile and optimize the emitted code. That said; there are very high performance .NET applications, where the writer has gone to great lengths to not do dynamic memory allocation inside the critical loops - mostly by using allocated pools of objects created beforehand.

So to repeat my previous comment: pick what you already know, then tune. Even if you hit a dead end; you will likely know much more about your problem space.

answered Jun 10 '10 at 16:54

sdg

4,645
3
32
26

"dynamic memory allocation inside the critical loops - mostly by using allocated pools of objects created beforehand." a little off-topic, but is it possible to do this using C#? Are there any samples available to do this in C++? "hundreds of events every millisecond " You do get these many when you are parsing market data feed from different exchanges – bsobaid Jun 10 '10 at 18:42
C++ - have a look at boost::pool C# - I am not as conversant, but understand it can/has been done – sdg Jun 10 '10 at 19:50
thanks, that was a useful lead. pool has it. Boos is not famous for its speed. Do you use it for your high-performance applications? I am mainly a C# developer but now I am stepping into C++ world. Its a must for HFT developers. – bsobaid Jun 10 '10 at 21:50
In the worst case you can always implement your memory pools allocators, so you don't depend on any library. Obviously this approach has it's disadvantages as well .. – rkachach Jun 21 '18 at 16:10

score 2 · Answer 3 · answered Jun 10 '10 at 17:01

2

It all depends on the situation.

Things that make unmanaged code faster / managed code slower:

the code needs to be converted to machine code before it can be executed
garbage collection might cause an overhead
calls from managed to unmanaged code have a serious overhead
unmanaged compilers can optimize more since they directly generate machine code (seen myself)

Things that make managed code faster / unmanaged code slower:

since the code is converted to machine code right before it's used, managed code can be optimized for the actual processor (with unmanaged code you have to target the 'minimum-supported' processor).

And probably there are many more reasons.

answered Jun 10 '10 at 17:01

Patrick

23,217
12
67
130

1

"the code needs to be converted to machine code before it can be executed" but it is a one time thing, it does'nt effect overall performance, does it? – bsobaid Jun 10 '10 at 18:43
Depends on often you execute the same code (only once or millions of times). In practice it probably won't matter. – Patrick Jun 10 '10 at 19:00
Unmanaged code can be made processor specific by branching bottlnecks depending on runtime detection of the processor. The intel C++ compiler can automatically do this, for example, though only for specific types of instruction and (controversially) only for intel CPUs. – Sideshow Bob Dec 14 '11 at 17:05

score 2 · Answer 4 · answered Jan 22 '12 at 22:07

Managed code is in most cases slower than Unmanaged code, even though the .Net CLR always does a JIT-compilation before executing the code (it is not compiled multiple times while the program is running but it well never interpret the code).

The problem is rather with many checks the CLR does, e.g. to see if you run over the bounds of an array whenever you try to access it. This leads to fewer problems with buffer overflows, etc. but also means a performance hit due to the added overhead of those checks.

I've seen experiments where C# outperformed C++ but those were conducted with code taking heavily advantage of object hierarchies, etc. When it comes down to number crunching and you want to get the most out of your PC you will have to go with unmanaged code.

Another point was also already mentioned - the GC leads to somewhat unpredictable pauses in the programs execution when memory must be freed. You need this time as well when doing memory management in unmanaged code but it occurs more often and whenever you decide to destroy an object which means its not all done at once for the whole program, so you don't have a long pause.

The key to performance - especially, but not only for numberchrunching - is memory management. This is the bottleneck! If you manage to get more efficient memory usage you will get more execution speed. Managed languages bring all options to do so and more convenient syntax for the user (IMPO). Therefore, your statement seems too general. See: http://stackoverflow.com/a/9327983/1215993 — Haymo Kutschbach, Feb 23 '12 at 09:47

score 0 · Answer 5 · answered Jun 10 '10 at 17:30

0

There are many good answers here, but one aspect of managed code that may give it an advantage in the long term is runtime analysis. Since the code generated by the managed compiler is an intermediate format, the machine code that actually executes can be optimized based on actual usage. If a particular subset of functionality is heavily used, the JIT'er can localize the machine code all on the same memory page, increasing locality. If a particular sub-call is made repeatedly from a particular method, a JIT'er can dynamically inline it.

This is an improvement over unmanaged code, where inlining must be "guessed" ahead of time, and excessive inlining is harmful because it bloats code size and causes locality issues that cause (very time-expensive) L2/L1 cache misses. That information is simply not available to static analysis, so it is only possible in a JIT'ing environment. There's a goody basket of possible wins from runtime analysis such as optimized loop unwinding, etc.

I'm not claiming the .NET JIT'er is as smart as it could be, but I know I've heard about global analysis features and I know a lot of research into runtime analysis has been done at Hewlett-Packard and other companies.

answered Jun 10 '10 at 17:30

David Gladfelter

4,175
2
25
25

a basic question, by run-time analysis you mean profiling? how do you do run-time analysis of your code? – bsobaid Jun 10 '10 at 19:49
One implementation would be for the .NET framework to begin execution of a managed assembly by interpreting the CLR byte codes and note frequency of execution of opcodes, high correlation between the execution of a routine and execution of a subroutine from that routine, etc, and then generate machine code taking advantage of that knowledge to minimize overhead (call stack construction/destruction, loop variable incrementing and jumps, fragmented "hot" memory regions, etc.) in frequently-executed operations. – David Gladfelter Jun 10 '10 at 20:52
that would be a very good way of tuning the code, but a very hard one for me to do...such as noting exec freq of opcodes etc – bsobaid Jun 10 '10 at 21:56
To be fair if you're using native code you could use PGO in VC++ (presumably other toolsets have something like it) to do profiled guided optimization of the app. You're speculating this might exist for managed - I know for a fact it exists for at least one native toolset. – Kate Gregory Jun 17 '10 at 14:24

score -1 · Answer 6 · answered Nov 24 '10 at 20:53

First, your statement "processes hundreds of events every millisecond." sounds quite unrealistic. Unless you have a special designed clock module in the computer, I don not think that you can achieve the goal with a generic PC (typical resolution is around 10 milliseconds). Secondly, Native C++ is vast better in terms of performance. There are a lot of optimization can be taken in term of C++ to speed up, while in managed code they are not possible. Also be aware that the garbage collection in managed code makes performance unpredictable -when GC fires up the whole process gets frozen. Once you run into the problem, the solution is more painful, now all the "nice style" offered by managed code will be gone.

As for the ability that manage code can optimize for CPU, it is true but you can take advantage of CPU features (SSE2, MMX etc.) in native C++ too. Based on my experience, the performance boost is negligible.

"There are a lot of optimization can be taken in term of C++ to speed up, while in managed code they are not possible. " - This is wrong. Managed code does not mandate the use of the garbage collector, it only uses it by default. The same optimizations that are gained by manual memory usage are available for use in C#. You can debug memory, registers, processor flags, and manipulate the stack in C# just like you can with C++. It's possible to even execute raw ASM if you jump through some hoops. — marknuzz, Jun 04 '16 at 00:22

score -1 · Answer 7 · answered Apr 05 '20 at 20:51

-1

In order of speed and power the asm > C > C++ >= C++/CLI > C# >= all others. But creating a web service in asm is a long pain. Then use the right langage for the right job and the right audience to do the best job, in the given time.

answered Apr 05 '20 at 20:51

zep

1

score -1 · Answer 8 · answered Jul 20 '11 at 07:24

Write fast code, is always a pain. The main issue that you can optimize just for one platform. This is really a case on Console, Embedded or other platform where Hardware is always the same. In real PC world this isnt the case. Different core, different istruction ecc ... make this a nightmare. This is the main issue, imho, that really make difference between man/unam code. Man. code can be optimistic optimizable for the new platform when its run. Unman code not, is written into the stone.

score -3 · Answer 9 · answered Jun 10 '10 at 16:49

-3

Isn't C++/CLI a half interpreted language like Java?

Also, didn't someone post a study just yesterday that showed that GC systems are always slower than non GC?

answered Jun 10 '10 at 16:49

Edward Strange

40,307
7
73
125

Performance of Managed C++ Vs UnManaged/native C++

9 Answers9

Linked