Regardless of the interop technique used, special transition sequences, called thunks, are required each time a managed function calls an unmanaged function and vice versa. These thunks are inserted automatically by the Visual C++ compiler, but it is important to keep in mind that cumulatively, these transitions can be expensive in terms of performance.
However surely the CLR calls C++ & Win32 functions the whole time. In order to deal with files/network/windows and nearly anything else, unmanaged code must be called. How does it get out of the chunking penalty?
Here's an experiment written in C++/CLI that may help to describe my issue:
#define REPS 10000000
#pragma unmanaged
void go1() {
for (int i = 0; i < REPS; i++)
pow(i, 3);
}
#pragma managed
void go2() {
for (int i = 0; i < REPS; i++)
pow(i, 3);
}
void go3() {
for (int i = 0; i < REPS; i++)
Math::Pow(i, 3);
}
public ref class C1 {
public:
static void Go() {
auto sw = Stopwatch::StartNew();
go1();
Console::WriteLine(sw->ElapsedMilliseconds);
sw->Restart();
go2();
Console::WriteLine(sw->ElapsedMilliseconds);
sw->Restart();
go3();
Console::WriteLine(sw->ElapsedMilliseconds);
}
};
//Go is called from a C# app
The results are (consistently):
405 (go1 - pure C++)
818 (go2 - managed code calling C++)
289 (go3 - pure managed)
Why go3 is faster than go1 is a bit of a mystery, but that's not my question. My question is that we see from go1 & go2 that the thunking penalty adds 400ms. How does go3 get out of this penalty, since it calls C++ to do the actual calculation?
Even if is this experiment is for some reason invalid, my question remains - does the CLR really have a thunking penalty every time it calls C++/Win32?