0

Let's say that in my c++ file, I have the following:

extern "C" void __stdcall AsmTest(
    __m128i& chain0);

and by examining the disassembly in the surrounding c++ code, I see that chain0 is written to and read from with

(1)

movdqa xmmword ptr [rsp+60h], xmm0

and

(2)

movdqa xmm0, xmmword ptr [rsp+60h]

respectively. In my .asm file, I have

OPTION CASEMAP:NONE

PUBLIC AsmTest

.CODE

AsmTest:
    movdqa xmm0, xmmword ptr [rsp+60h]
    ret
END

calling AsmTest(chain0) in my c++ code causes an Access Violation. Can I avoid this problem?

MNagy
  • 423
  • 7
  • 20
  • Why do you need to do this? Isn't it simpler to just pass the necessary arguments to the called function? – user202729 Aug 30 '19 at 01:10
  • Also why do you need to use an assembly function instead of inline assembly? (calling a function pushes the return address on the stack) – user202729 Aug 30 '19 at 01:11
  • For the second question, I wish to know how do to something like this because if I were to have to have separate assembly for x86 vs x64, I might find it more elegant to have them in separate files. For the first, I'm still pretty novice at this x86 assembly business and not sure I understand what you mean in this context, or if I do understand then I don't know the proper syntax. Could you post an example? – MNagy Aug 30 '19 at 01:18
  • Use `vectorcall` to let MSVC pass `__m128i` values in XMM registers. Windows x64's default `fastcall` convention is bad for small function. (Small functions are bad in general because of function call overhead on optimizing the code around the call-site, and `call`/`ret` overhead.) – Peter Cordes Aug 30 '19 at 01:39
  • @user202729 - if this is 64 bit code, then Visual Studio doesn't support inline assembly. – rcgldr Aug 30 '19 at 01:49
  • Is there some reason you feel you must write this in asm? It's possible to use the xmm registers/instructions by using compiler intrinsics with straight C code (see the `_mm_` functions defined in xmmintrin.h). For example `__m128i x1 = _mm_load_si128(ptr);` Indeed, you may be able to use the same code for both x86 and x64 and let the compiler generate the most efficient assembler for each. You aren't by chance intending to do aes compression, are you? – David Wohlferd Aug 30 '19 at 08:28
  • This here is a learning exercise for me, but yes, I do intend to explore what (if any) optimizations I can make by writing my own assembly in the future... and yes, oddly enough, aes compression will likely be involved in the future, but that may or may not be the focus. – MNagy Aug 30 '19 at 16:20
  • Spooky, huh? Actually I just answered a [question](https://stackoverflow.com/a/57684519/2189500) about xmm which also uses chains, which made me think of it. Note that his solution was to move from (particularly badly written) asm to intrinsics. Better perf, more maintainable code, and it solved a thread safety issue that was giving him problems. Also the compiler can compile intrinsics down to different instructions depending on the target (x86/x64) and features (SSE, AVX, etc). While raw asm can (sometimes) be faster than C, it's typically hard for mortals to accomplish that. Food 4 thot. – David Wohlferd Aug 30 '19 at 23:13

1 Answers1

4

Use vectorcall to let MSVC pass __m128i values in XMM registers, if you pass by value instead of forcing it to memory by using a reference.

Windows x64's default fastcall convention is bad for small function. (Small functions are bad in general because of function call overhead on optimizing the code around the call-site, and call/ret overhead.)


Your test function is broken because [rsp+60h] in the callee is not the same address as [rsp+60h] in the caller. The call instruction itself pushes an 8-byte return address.

movdqa requires 16-byte alignment so your load fault. (The ABI requires the stack to be aligned by 16 before a call.)


But you shouldn't actually be accessing it relative to rsp at all: it's not passed as a stack-arg per se, but rather by reference using a pointer. When the first arg is an integer/pointer it goes in RCX. That's why you'll see the caller setting up RCX to hold a pointer to that stack space.

Let MSVC compile __m128i AsmTest(__m128i x){ return x; } with optimization enabled and see where it loads from. https://godbolt.org/z/7pvWqa

        movdqu  xmm0, XMMWORD PTR [rcx]
        ret

It uses movdqu instead of movdqa because MSVC would rather make your code run slow on old CPUs like Core 2 and K8/K10 than fault when you misalign a __m128i. Apparently.


BTW, learning from compiler output is helpful when you know enough to understand why the compiler is doing what it's doing, and just need to check the details.

You should also look up documentation on the calling convention. See links in https://stackoverflow.com/tags/x86/info.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • This seems like it's close to answering my question well, but I'm still having some trouble. Being so new to this, I'm still having trouble with syntax and haven't been able to get a version using vectorcall to compile--haven't yet found the right example I suppose. Ignoring the movdqa xmm0, xmmword ptr [rsp+60h] line, could you post how you would change the .asm I posted in order for it to compile with the corresponding extern "C" void __vectorcall AsmTest( __m128i& chain0);? I would really appreciate it. – MNagy Aug 30 '19 at 21:00
  • 1
    You never actually say what "still having trouble" means. Is there an error message? At a guess, you're failing to correctly [decorate](https://learn.microsoft.com/en-us/cpp/build/reference/decorated-names?view=vs-2019) the name of the function in your asm file. I'd expect something like `AsmTest@@8`, where `@@` indicates a vectorcall, and 8 is *a decimal number of bytes in the parameter list.* 64 bit pointers are 8 bytes long. – David Wohlferd Aug 31 '19 at 10:32
  • @david the @@8 thing is only for 32 bit callee pops stack args – Peter Cordes Aug 31 '19 at 15:32
  • @DavidWohlferd your guess was absolutely correct! Now it's working and I can continue with my futzing and fiddling. – MNagy Sep 01 '19 at 18:15