Calling convention to use for max. portability between x86 systems

Question

I am working on a set of self-contained x86 assembly routines that I would like to make available to C programs on systems below:

Linux 64-bit only
Windows 32-bit and 64-bit
(Good to have ultimately, Mac 64-bit, but this is not clear as Apple appear to be on their way to drop x86 in favour of ARM)

I use LLVM in some other capacity already and it is almost certain that I would use clang rather than gcc although I can envisage a situation of someone's wanting to compile the whole of it using gcc. The assembler will be NASM.

I develop both the routines and a C library that exposes them to users, i.e. everything is under my control and I can design everything as needed.

I expect that some users will actually use C++ but they will still link to the C library - that is, not with the assembly routines directly.

As I am new to assembly, I am in the process of discovering a wonderful maze of various calling conventions spread across systems, compilers, vendors, calling variants and languages. I cannot say that it does not make for interesting reads sometimes but I cannot say either that it is not confusing to beginners.

My take after reading up on it all is that at the end of the day I can simply start with cdecl for maximum portability in the initial version and then think about special casing to cover other conventions if needs arise - depending on what the routines actually do I may make things faster by using other conventions in specific cases.

But initially, as I would like to have something that works correctly and then optimise it even further - is it correct to say that settling on cdecl will offer maximum portability across the systems that I listed? Thank you.

The calling conventions between x86 and x64 are very different and so are the assembly instructions. Anyway I guess you need different assembly code for x86 and x64 anyway. Also google "calling convention", there is a lot of information available and the subject is vast. — Jabberwocky, Sep 08 '20 at 10:11
Thanks @Jabberwocky but I do not understand why you interpreted my question as my not having done my homework already? This is a genuine puzzle to me - I did not ask about differences between 32-bit and 64-bit, I did read about various conventions and I asked for a confirmation if my summary is correct but you tell me "google up some"? — , Sep 08 '20 at 10:15
Note that even on x64, the [calling conventions](https://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions) differ between Linux and Windows. For instance, first 32-bit integer argument is passed in `edi` on Linux, but in `ecx` on Windows: https://godbolt.org/z/Wen3nv. — Daniel Langr, Sep 08 '20 at 10:16
x86-64 Linux and MacOS both use the x86-64 System V ABI. Windows uses its own calling convention. None of these x86-64 platforms call it "cdecl". You'll want to versions of your library, probably build with asm macros to adapt the tops of your functions for different calling conventions. Agner Fog's calling convention guide has some suggestions for dealing with portability of hand-written asm. https://www.agner.org/optimize/ — Peter Cordes, Sep 08 '20 at 10:16
Right @PeterCordes - so essentially, there is no one thing really to settle on? I am fine with this being the answer, I am just not sure if I understand it correctly. — , Sep 08 '20 at 10:19
@Jabberwocky I think you wrote a hasty comment to which I posted a hasty reply for which I apologise because I misread part of what you wrote. But I really thought SO would not be one of the places with "just search the web" kind of answers. Thanks for your understanding. — , Sep 08 '20 at 15:16
@W.Chang - Can you please post a bit more information in what way this settled? — , Sep 08 '20 at 15:17
@Terry - As Peter answered below, if you are developing 64-bit programs for Windows, use Win64 ABI. If you are developing 64-bit programs for Linux/macOS, use System V AMD64 ABI. Virtually all existing libraries and compilers use them. For 32-bit programs, you need to decide which one to use. You can use cdecl/stdcall/vectorcall or even Delphi's register ABI, as long as your function declaration is consistent with the library. The 32-bit ABIs are messy in order to support existing libraries using different ABIs. 64-bit ABIs are not too bad. Of course it would be better if there was onyl 1. — W. Chang, Sep 09 '20 at 16:03
@Terry - NASM's macro support is quite powerful. You can define .p1 use `-d` in command line to define a symbol. Then — W. Chang, Sep 09 '20 at 16:08
@Terry - (Ignore above. Hit `Enter` by mistake) NASM's macro support is quite powerful. You can define `.p1` as 'RCX' in Win64 mode and as `RDI` in System V mode. Then you can kind of write code for both platforms. You can use `-d` in command line to define a symbol to tell NASM which mode. I wrote a 1200-line macro set to do that and more. I can actually write a single code and NASM compiles it in 5 modes: Delphi, stdcall, cdecl, Win64 and System V. It sounds impossible but NASM really impressed me. — W. Chang, Sep 09 '20 at 16:15
Thanks @W.Chang - having posted my question, I gave it some more thought and it is possible that I will be able to get away with supporting only 64-bit systems which should simplify things. It it great to hear good things about NASM, this is what I settled on ultimately myself. I will leave the question and answer as they are because over time perhaps someone else will need the 32-bit parts of it. — , Sep 09 '20 at 17:09
@Terry - Please take a look on this: http://rvelthuis.de/articles/articles-nasm.html . Rudy came up with the same idea as mine (probably earlier than me). I believe his macros support Win64 & Delphi Register ABIs. I found his work when I was almost done with my own. I did borrow his idea of the PARAM macro. (His code should work but I didn't spend enough time to make it work on my hands.) — W. Chang, Sep 09 '20 at 17:46
@W.Chang This is interesting, I will check it out. Slightly above my current level but this is even better. Thanks again! — , Sep 09 '20 at 18:05

Peter Cordes · Accepted Answer · 2020-09-08T10:24:54.500

x86-64 Linux and MacOS both use the x86-64 System V ABI. Windows uses its own calling convention. None of these x86-64 platforms call it "cdecl".

The normal approach is for your library to uses the standard calling convention for the target platform, which means different asm for each one. One way to handle this is with asm macros to adapt the tops of your functions for different calling conventions. Or to parameterize register names like ARG1 instead of hard-coding RDI, but that gets very complicated very fast if your functions are more than trivial pointer increments, or if you ever use a register for something other than a function arg.

On 32-bit Window you have a choice of multiple conventions; fastcall / vectorcall are the two that suck the least. On every other x86 32 and 64-bit platform, there's one standard calling convention. It'll be easier for people to use your library if you follow it.

Agner Fog's calling convention guide has some more detailed suggestions for dealing with portability of hand-written asm. https://www.agner.org/optimize/

You could in theory use x86-64 System V everywhere, but then on Windows MSVC would be unable to emit calls to your code. (GNU C compatible compilers like gcc, clang, and ICC could use __attribute__((sysv_abi)) in the prototypes on Windows where their default calling convention is what MS names x64 fastcall).

I guess you could use x86-64 fastcall everywhere and use __attribute__((ms_abi)) in your prototypes for non-MSVC compilers. But that may cost some performance overhead, especially if you want to use all the XMM regs. (xmm6..15 are call-preserved in x64 fastcall). But beware of compiler bugs; using non-default calling conventions is not nearly as well tested.

If all your functions have 4 or fewer total register args, it's not too bad a calling convention in most respects. Otherwise more register args are usually more efficient. Why does Windows64 use a different calling convention from all other OSes on x86-64?

32-bit and 64-bit are obviously vastly different; none of the standard calling conventions are compatible between 32 and 64-bit code, and your code will usually need to be pretty different anyway.

The only real similarity is between 32-bit Windows fastcall and the standard 64-bit Windows calling convention (which MS also calls fastcall), but 32-bit fastcall only passes the first 2 args in regs, and is callee-pops stack args. 64-bit fastcall passes the first 4 args in regs, starting with the same 2 but then using r8 and r9 which only exist in 64-bit mode.

Thanks for the reply, this is some real food for thought. After posting my question, I started to ask myself if I truly need to support Windows 32-bit. I have been using Linux exclusively for many years and I just do not know much about Windows anymore but it looks that this system will become 64-bit only rather sooner than later. I just want to make life easier for the users of my software and, because they will be developers I have realised I may expect that they will be on 64-bit anyway. — , Sep 08 '20 at 15:22
I will not be updating the question so as not to make it unclear why in your answer you talk about 32-bit. Besides, there may be other people in the future looking for this information too. Thank you again. — , Sep 08 '20 at 15:24

Calling convention to use for max. portability between x86 systems

1 Answers1