4

Calli opcode requires a calling convention. By default it is stdcall, while extern "C" in native libraries uses cdecl.

JIT recently allowed to inline methods with calli, but only with default calling convention. When I call a method with calli without unmanaged cdecl it works on x64 and performance is 58% faster than DllImport and 2.2x faster than unmanaged function pointer. (on netcoreapp2.1, on net471 the difference is bigger: 82% and 5.5x ) When I run a method with calli unmanaged cdecl, performance is on par with DllImport (around 1% slower).

I have read that on x64 there is no longer a mess with stdcall vs cdecl and all methods use cdecl (or fastcall, seen that in another place, cannot find a link). The difference only applies to x86, where my call without unmanaged cdecl does indeed crash the app with segfault.

The method in question is the following. For tests I use noop native method only to measure native call overhead.

  .method public hidebysig static int32 CalliCompress(uint8* source, native int sourceLength, uint8* destination, native int destinationLength, int32 clevel, native int functionPtr) cil managed aggressiveinlining
  {
    .custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor()
             = {}
    // 
    .maxstack  6
    ldarg.0
    ldarg.1
    ldarg.2
    ldarg.3
    ldarg 4
    ldarg 5
    calli unmanaged cdecl int32 (uint8* source, native int sourceLength, uint8* destination, native int destinationLength, int32 clevel) 
    ret
  }

My questions:

1) Is it safe to omit unmanaged cdecl after calli on x64 "by design" or I am just lucky with this example? If on x64 all calls are cdecl then I could use JIT treating static readonly fields as constants dispatch to appropriate methods for free just using if(IntPtr.Size == 8) {..call fast method..}else{..use unmanaged cdecl..}

2) What does caller or callee cleans the stack mean? My native function returns an int that is on the stack after the call. Is this the issue about who removes this int from the stack? Or there is some other work needs to be done with stack inside native function? I am in control of native function and could return the value via a ref parameter - will this make the issue with the stack cleaning irrelevant since no stack changes are made during the call?

V.B.
  • 6,236
  • 1
  • 33
  • 56
  • In effect, the runtime only supports one calling convention on x64, which is "the" calling convention. `cdecl`, `stdcall`, `fastcall` and `thiscall` are all the same. See [here](https://blogs.msdn.microsoft.com/oldnewthing/20040114-00/?p=41053/) for more details. Return values are passed in the RAX register, they don't go through the stack (unless the parameter exceeds 64 bits, but that's not relevant for CLR calls), but in any case you don't need to worry about cleaning the stack, since the CLR will emit the appropriate code. – Jeroen Mostert Sep 11 '18 at 11:01
  • @JeroenMostert so on x64 I could always use inlineable `calli` without `unmanaged` and any calling convention? In this true for Linux x64 as well? – V.B. Sep 11 '18 at 11:05
  • The x64 calling convention on Linux is not identical to that of Windows (different registers are used), but it's still true that there's "one" calling convention, so I'd expect the answer to be "yes". But I have no experience with unmanaged interop on Linux. – Jeroen Mostert Sep 11 '18 at 11:11
  • @JeroenMostert it works in Docker, with defualt `calli` surprisingly 5x faster than DllImport (in Release+F5 mode from VS). Will have to ask in coreclr repo if this is guaranteed to work instead of "happens to work" – V.B. Sep 11 '18 at 11:41
  • `calli` can be used both for calling managed and unmanaged functions. Omitting `unmanaged` causes it to treat the function as managed, which *might* work if the JITter happens to call managed functions with the same signature in the same way, but could fail miserably if it uses some other calling convention (which is an implementation detail). – IS4 Sep 15 '18 at 15:12
  • @IllidanS4 have you read the question? The question is exactly does it work by design and do we need calling convention on x64? It not only *might* but already does work, with perf numbers given above. Also, calli cannot call managed function, it calls native compiled code. From the first link: `the method entry pointer is assumed to be a specific pointer to native code... Such a pointer can be created using the Ldftn or Ldvirtftn instructions, or passed in from native code.` For managed functions there is `call` and `callvirt`. – V.B. Sep 16 '18 at 17:16
  • @V.B. `calli` can very easily call a managed function, if provided a pointer from `ldftn`. Notice that `ldftn` returns a pointer to native code for a *managed* function, with the managed calling convention. You cannot call the pointer from `ldftn` from C, for example, since the function still uses the managed calling convention. If it works for an unmanaged calling convention, it works so by accident. – IS4 Sep 16 '18 at 17:22
  • did you add SuppressUnmanagedCodeSecurity attribute and SetLastError = false ? those make huge difference. – TakeMeAsAGuest Jan 30 '19 at 15:36

1 Answers1

0

The answer is, it is not safe. See this discussion at dotnet/coreclr: https://github.com/dotnet/coreclr/issues/19997

Glenn Slayden
  • 17,543
  • 3
  • 114
  • 108
V.B.
  • 6,236
  • 1
  • 33
  • 56