0

I have an existing Unix project that I am porting to Windows, this includes a handful of assembler versions. For example:

blake3.c

extern void blake3_compress_xof_sse2(P1, P2, P3, P4, P5, P6);

static void blake3_caller( ... ) 
{
    blake3_compress_xof_sse2(P1, P2, P3, P4, P5, P6);
}

blake3.s

.intel_syntax noprefix
.global zfs_blake3_compress_xof_sse2

.p2align 6
blake3_compress_xof_sse2:
        movups  xmm0, xmmword ptr [rdi] # P1
        ...

Because all the .s are from Unix, I've had to convert the argument passing from

Unix: rdi, rsi, rdx, rcx, r8, r9

Windows: rdx, rcx, r8, r9, rsp+0x28, rsp+0x30

(and as well, saving rdi, rsi, xmm6-xmm15)

Then I stumbled across __attribute__((sysv_abi)).

I would be very neat if I could just tell the compiler to use the Unix parameter order: rdi, rsi, rdx, rcx, r8, r9

So I define the assembler function in C, as:

extern void __attribute__((sysv_abi)) blake3_compress_xof_sse2(P1, P2, P3, P4, P5, P6);

but I am disappointed to see when I dump registers entering blake3_compress_xof_sse2: P1 is still in rdx and not as I had hoped, rdi.

Is there some way to get this to work? It would be very beneficial to not have to tweak all the .s files for Windows. (or have a separate copy of .s files for Windows)

Compile time, I use clang to compile all the source files, .c and .s, into a libkern.a. Then at the end, link a driver.c with MSVC and the libkern.a for the final driver. Oh yeah, this is in kernel mode. But since both blake3.c and blake3.s are inside libkern.a, I would not have thought that should matter.

lundman
  • 1,616
  • 13
  • 25
  • The first argument is in `rcx` when using the Windows x64 ABI. I don't have a Windows VM with clang, so I cannot test it but the other way around (Linux clang with a ms_abi function) [works](https://godbolt.org/z/oY1f3e35K). So I'd expect a sys_v abi function to be correctly invoked by clang under Windows. Have you tried with a minimal example? Just to exclude secondary factors (like stale object files)? – Margaret Bloom Nov 17 '22 at 07:59
  • Also tried `__declspec(sysv_abi)` since I'm using clang-cl.exe. It compiles happily, no warngings, but still passes args in ms registers. I suppose a minimal example is the next step. I do wonder if that they are in separate files matters. – lundman Nov 17 '22 at 08:33
  • I was wondering that too, but the caller only looks at the declaration and that has the attribute modifier. – Margaret Bloom Nov 17 '22 at 08:43
  • The first instruction of a function in kernel mode is `movups xmm0, [rdi]`? Did the C caller already do `kernel_fpu_begin()` or the Windows equivalent? Also, `__attribute__((sysv_abi))` should work for callers that see that prototype. Have you tried regular clang, not `clang-cl`? I wonder if `clang-cl` maybe has a bug where it doesn't respect that attribute. – Peter Cordes Nov 17 '22 at 09:42
  • Note that a Windows x64 function calling a SysV function will have to save/restore RDI+RSI, and XMM6..15 because Windows x64 has way too many call-preserved XMM regs. So it's somewhat inefficient. If you're compiling with `-mcmodel=kernel` and/or `-mgeneral-regs-only`, I wonder if that makes the compiler unwilling to do that, and maybe gives up on respecting the attribute? – Peter Cordes Nov 17 '22 at 09:43
  • Some projects handle it by using macros instead of hard register names, so they can change things around. Or a couple extra instructions in a `%if` or whatever that `mov` regs around so the rest of the function has args where it wants them, even though that wastes instructions at run-time. – Peter Cordes Nov 17 '22 at 09:45
  • The Linux code does indeed run kernel_fpu_begin() - currently empty for Win port. All your notes is precisely why I was hoping to be able to use sysv_abi, and though it accepts it, it does not yet change the registers used when it lands in the assembler. vs2019 is clang 12, and vs2021 is 15. I don't think it would be a particularly new feature though. – lundman Nov 17 '22 at 11:01
  • @PeterCordes I've been reading a bunch of your comments around here, and if I can't get this to work, maybe I could create a wrapper for it with inline assembly, something like: [gist](https://gist.github.com/lundman/f322b5be05d2c016feb5cc0df3ccd6be) except, you know, something that works. – lundman Nov 18 '22 at 07:25
  • My understanding is that the Windows kernel has some equivalent function you need to call if you're going to use XMM registers; the problem with clobbering XMM regs isn't about just about the calling convention, it's about user-space. GCC or clang aren't going to invent a call to some windows kernel function just because you asked them to call a sysv_abi function from an ms_abi function, for various reasons. ms_abi still allows clobbering xmm0..5 or whatever the cutoff is, and the high halves of all ymm/zmm registers, which is obviously a disaster if that's the only copy of user-space state. – Peter Cordes Nov 18 '22 at 07:53
  • Using inline asm seems like a bad idea, but I guess if you can't get the compiler to respect `sysv_abi` it could work. But only if you declare clobbers on R10 and R11, and RAX, and list all the arg-passing registers as `"+D"(cv)` etc. because they're also call-clobbered. – Peter Cordes Nov 18 '22 at 07:58
  • See [Calling printf in extended inline ASM](https://stackoverflow.com/q/37502841) for an example. You can omit the xmm/mmx/x87/k clobbers since the C caller will be using `-mgeneral-regs-only` or an equivalent like `-mno-sse -mno-mmx -mno-x87` if it doesn't have that. Since this is kernel code (also the caller is ms_abi), it's definitely not using a red-zone so you don't need to wrap with `sub rsp, 128` / `add rsp, 128`, and there are no stack args. It would still be better for the compiler to know that there's a function call. – Peter Cordes Nov 18 '22 at 08:00
  • Oh, and of course if your function derefs any of those pointer args, the pointed-to memory needs to be a dummy input and/or output. Or you need a `"memory"` clobber; that's much easier to get right, and only hurts optimization the same amount as a non-inline function call. [How can I indicate that the memory \*pointed\* to by an inline ASM argument may be used?](https://stackoverflow.com/q/56432259) – Peter Cordes Nov 18 '22 at 08:13
  • @PeterCordes Oh that's a lot of info. Why did you list R10 R11 specifically? I was simply going to add any register used by the function, ignoring things that are listed as volatile. As in, only save things "used" from "The x64 ABI considers registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15, and XMM6-XMM15 nonvolatile" I will have to check on Windows kernel function for using XMM. – lundman Nov 18 '22 at 10:06
  • Your function has 6 args, right? You can't declare a clobber on a register that's also an input or output operand, so that leaves only R10 and R11 being neither an arg nor a return value in the ABI, pure scratch regs. And with no return value, you also need to tell the compiler about RAX. (If there were only 5 args, also R9 should be a clobber instead of a read+write input.) – Peter Cordes Nov 18 '22 at 10:14
  • But sure, if your function doesn't actually destroy all the call-clobbered registers the ABI allows it to, and you don't mind keeping that call site in sync with the function for future changes, then yeah you don't need to tell the compiler about clobbers that don't actually happen. Just make sure to leave a big comment at the top of the function to update whatever other file because you're using a custom calling convention merely based on x86-64 SysV, preferably next to where you document how the function uses each register so people will have to look at it when changing reg usage. – Peter Cordes Nov 18 '22 at 10:17
  • But yes, for the idea of a macro, I will add all registers there, so I don't have to tweak for each function (well, maybe a regular, and regular+xxm). Looks like I don't have to do anything for xmm usage (but doesn't hurt if I did) in x64, but for avx, I need to call `KeSaveExtendedProcessorState()` – lundman Nov 18 '22 at 10:23

1 Answers1

1

It turns out this works rather well, for example

void __attribute__((sysv_abi)) sysv_abi_func(int p1, int p2, int p3, int p4, int p5, int p6, int p7);

void call_sysv_abi(int p1, int p2, int p3, int p4, int p5, int p6, int p7) {
    sysv_abi_func(p1, p2, p3, p4, p5, p6, p7);
}
       0: 56                            pushq   %rsi
       1: 57                            pushq   %rdi
       2: 48 81 ec c8 00 00 00          subq    $200, %rsp
       9: 44 0f 29 bc 24 b0 00 00 00    movaps  %xmm15, 176(%rsp)
      12: 44 0f 29 b4 24 a0 00 00 00    movaps  %xmm14, 160(%rsp)
      1b: 44 0f 29 ac 24 90 00 00 00    movaps  %xmm13, 144(%rsp)
      24: 44 0f 29 a4 24 80 00 00 00    movaps  %xmm12, 128(%rsp)
      2d: 44 0f 29 5c 24 70             movaps  %xmm11, 112(%rsp)
      33: 44 0f 29 54 24 60             movaps  %xmm10, 96(%rsp)
      39: 44 0f 29 4c 24 50             movaps  %xmm9, 80(%rsp)
      3f: 44 0f 29 44 24 40             movaps  %xmm8, 64(%rsp)
      45: 0f 29 7c 24 30                movaps  %xmm7, 48(%rsp)
      4a: 0f 29 74 24 20                movaps  %xmm6, 32(%rsp)
      4f: 8b 84 24 10 01 00 00          movl    272(%rsp), %eax
      56: 8b 84 24 08 01 00 00          movl    264(%rsp), %eax
      5d: 8b 84 24 00 01 00 00          movl    256(%rsp), %eax
      64: 44 89 4c 24 1c                movl    %r9d, 28(%rsp)
      69: 44 89 44 24 18                movl    %r8d, 24(%rsp)
      6e: 89 54 24 14                   movl    %edx, 20(%rsp)
      72: 89 4c 24 10                   movl    %ecx, 16(%rsp)
      76: 8b 84 24 10 01 00 00          movl    272(%rsp), %eax
      7d: 44 8b 8c 24 08 01 00 00       movl    264(%rsp), %r9d # P6
      85: 44 8b 84 24 00 01 00 00       movl    256(%rsp), %r8d # P5
      8d: 8b 4c 24 1c                   movl    28(%rsp), %ecx  # P4
      91: 8b 54 24 18                   movl    24(%rsp), %edx  # P3
      95: 8b 74 24 14                   movl    20(%rsp), %esi  # P2
      99: 8b 7c 24 10                   movl    16(%rsp), %edi  # P1
      9d: 89 04 24                      movl    %eax, (%rsp)
      a0: e8 00 00 00 00                callq   0xa5 <call_sysv_abi+0xa5>

My issue was something I had done in the past to make it compile the headers with MSVC++:

#ifdef _MSC_VER
#define __attribute__(X)

since MSC can't understand them. Sigh.

Big thanks to mstorsjo over at llvm discourse.

lundman
  • 1,616
  • 13
  • 25
  • Oh, and clang-cl defines `_MSC_VER`, so this macro broke your code? xD. [How to tell Clang to stop pretending to be other compilers?](https://stackoverflow.com/q/38499462) - you can't. Best to do something like `#ifdef __GNUC__` / `#define SYSV_ABI __attribute__((sysv_abi))` / `#else` define it as something that will error if used like `#define SYSV_ABI SYSV_ABI_REQUIRES_GNU_EXTENSIONS`, or `#error` on the spot during preprocessing. For some function tags, there are equivalents for other compilers so it makes sense to use portable names. – Peter Cordes Nov 20 '22 at 03:30
  • Yeah, MSC_VER wont do, at the time I just wanted all attributes gone for quick port, this time, I simply went through them all and used the closest __declspec. Ie, proper work, and there weren't many. – lundman Nov 20 '22 at 05:48
  • Only clang-cl and MSVC (and ICC) understand `__declspec`. Seems to me like a worse less-portable choice to use it instead of a macro you can define as appropriate. But if the project is small enough, then sure, you can always change it again later. – Peter Cordes Nov 20 '22 at 05:50