Calli opcode requires a calling convention. By default it is stdcall
, while extern "C"
in native libraries uses cdecl
.
JIT recently allowed to inline methods with calli
, but only with default calling convention. When I call a method with calli
without unmanaged cdecl
it works on x64
and performance is 58% faster than DllImport
and 2.2x faster than unmanaged function pointer
. (on netcoreapp2.1
, on net471
the difference is bigger: 82% and 5.5x ) When I run a method with calli unmanaged cdecl
, performance is on par with DllImport
(around 1% slower).
I have read that on x64 there is no longer a mess with stdcall
vs cdecl
and all methods use cdecl
(or fastcall
, seen that in another place, cannot find a link). The difference only applies to x86
, where my call without unmanaged cdecl
does indeed crash the app with segfault.
The method in question is the following. For tests I use noop native method only to measure native call overhead.
.method public hidebysig static int32 CalliCompress(uint8* source, native int sourceLength, uint8* destination, native int destinationLength, int32 clevel, native int functionPtr) cil managed aggressiveinlining
{
.custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor()
= {}
//
.maxstack 6
ldarg.0
ldarg.1
ldarg.2
ldarg.3
ldarg 4
ldarg 5
calli unmanaged cdecl int32 (uint8* source, native int sourceLength, uint8* destination, native int destinationLength, int32 clevel)
ret
}
My questions:
1) Is it safe to omit unmanaged cdecl
after calli
on x64
"by design" or I am just lucky with this example? If on x64 all calls are cdecl
then I could use JIT treating static readonly
fields as constants dispatch to appropriate methods for free just using if(IntPtr.Size == 8) {..call fast method..}else{..use unmanaged cdecl..}
2) What does caller or callee cleans the stack
mean? My native function returns an int
that is on the stack after the call. Is this the issue about who removes this int
from the stack? Or there is some other work needs to be done with stack inside native function? I am in control of native function and could return the value via a ref parameter - will this make the issue with the stack cleaning irrelevant since no stack changes are made during the call?