copying a value into the address of a pointer in an assembly function called from c++ (nasm)

Question

I'm trying to learn x86-64 assembly, and I found the book "Modern X86 Assembly Language Programming: Covers x86 64-bit, AVX, AVX2, and AVX-512", but it uses MASM and Visual C++ and I use Linux. So I'm trying to convert some of the programs in it to NASM-syntax, but I encountered a problem with storing the result of a calculation in a pointer passed to the function. The C++ code is

#include <iostream>
#include <iomanip>
#include <bitset>
using namespace std;
extern "C" int IntegerShift_(unsigned int a, unsigned int count, unsigned int* a_shl, unsigned int* a_shr);
static void PrintResult(const char* s, int rc, unsigned int a, unsigned int count, unsigned int a_shl, unsigned int a_shr)
{
        bitset<32> a_bs(a);
        bitset<32> a_shr_bs(a_shl);
        bitset<32> a_shl_bs(a_shr);
        const int w = 10;
        const char nl = '\n';

        cout << s << nl;
        cout << "count = " << setw(w) << count << nl;
        cout << "a = " << setw(w) << a << " (0b" << a_bs << ")" << nl;

        if (rc == 0)
                cout << "Invalid shift count" << nl;
        else
        {
                cout << "shl = " << setw(w) << a_shl << " (0b" << a_shl_bs << ")" << nl;
                cout << "shr = " << setw(w) << a_shr << " (0b" << a_shr_bs << ")" << nl;
        }
        cout << nl;
}

int main()
{
        int rc;
        unsigned int a, count, a_shl, a_shr;
        a = 3119;
        count = 6;
        rc = IntegerShift_(a, count, &a_shl, &a_shr);
        PrintResult("Test 1", rc, a, count, a_shl, a_shr);

    return 0;
}

This code tests the function IntegerShift_, which is written in assembly. (There are a few more tests in the main function that I didn't include here since they are basically the same as the one I did include). The original assembly code in the book is MASM code:

    
.code
IntegerShift_ proc
xor eax,eax 
cmp edx,31            
ja InvalidCount            
xchg ecx,edx    
mov eax,edx  
shl eax,cl    
mov [r8],eax  
shr edx,cl   
mov [r9],edx    
mov eax,1
InvalidCount:    
ret    
IntegerShift_ endp
end

The obvious way to translate this into NASM code (at least from what I know) is the following:

section .text
global IntegerShift_
IntegerShift_:
xor eax,eax
cmp esi,31           
ja InvalidCount            
xchg ecx,esi    
mov eax,esi  
shl eax,cl    
mov [rdx],eax  
shr esi,cl   
mov [rsi],esi    
mov eax,1
InvalidCount:    
ret

however, assembling, compiling, and running this with:

nasm -f elf64 [asm filename]
g++ -Wall -no-pie [object file filename] [cpp filename] -o prog
./prog

results in a segmentation fault. I tried solving this every way I could think of and spent more than a couple hours on this, but I couldn't get it to work. I'm almost certain the problem is the way I try to store the results in the addresses of the a_shl and a_shr pointers, but I can't understand what I'm doing wrong and I will really appreciate some help. Thanks in advance!

Note that choosing the parameter order to match the calling convention would save some instructions, e.g. pass count first on Windows, or 4th if you're optimizing for x86-64 System V, so it's already in ECX, and you don't have another arg you need already there. With the current order, `mov eax, ecx` + `xchg ecx, edx` is the smallest (in bytes) way. Doing `mov` first is better: it avoids making it dependent on an output of xchg. But 3 total `mov` instructions would get the job done (3 total uops, instead of 1 + 3 on Intel). e.g. for Windows, `mov eax, ecx` / `mov ecx, edx` / `mov edx, eax` — Peter Cordes, Jan 06 '21 at 21:30
3 movs is what compilers do: they pretty much never use `xchg r,r`: https://godbolt.org/z/W8Pz6G. And of course if you have BMI2, `shlx eax, ecx, edx` / store eax / `shrx eax, ecx, edx` / store eax. https://www.felixcloutier.com/x86/sarx:shlx:shrx — Peter Cordes, Jan 06 '21 at 21:31

score 2 · Accepted Answer · answered Jan 06 '21 at 15:11

2

First, the calling conventions are different between Windows and Linux.

https://en.wikipedia.org/wiki/X86_calling_conventions

It appears you incompletely changed this..

Second, while you can mostly use 32-bit registers, you must treat addresses as their full 64-bit values.

Finally, you are also modifying esi then using rsi - they are overlapping registers - this is what resulted in your segmentation fault.

With those changes:

;extern "C" int IntegerShift_(unsigned int a, unsigned int count, unsigned int* a_shl, unsigned int* a_shr);
; RDI, RSI, RDX, RCX,

section .text
global IntegerShift_
IntegerShift_:
xor eax,eax
cmp esi,31
ja InvalidCount
xchg rcx,rsi    ; Need full 64-bit exchange
mov eax,edi     ; (r)di is the 'a' value
shl eax,cl
mov [rdx],eax
shr edi,cl      
mov [rsi],edi
mov eax,1
InvalidCount:
ret

answered Jan 06 '21 at 15:11

Halt State

421
2
6

To be fair, the OP didn't completely ignore the calling-convention differences, e.g. they correctly changed to using `esi` instead of `edx` for the shift-count (2nd arg). And RDX for R8 (3rd arg). But then they seem to get most everything else wrong, e.g. using still RSI as a pointer (possible type for RDI which would also be wrong?) – Peter Cordes Jan 06 '21 at 21:15
Also, you can replace `xchg` (3 uops on Intel) with just 2 `mov` instructions by picking an unused register, like compilers do. https://godbolt.org/z/Yec1TT. That will let mov-elimination work. If you do want to use `xchg`, the `dst -> src` direction [is the faster one on Intel, at least in my testing on Skylake](https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures), so `xchg rsi, rcx` would make CL ready a cycle earlier than `xchg rcx,rsi`. But still 1 cycle latency instead of 0 on CPUs with mov-elimination. – Peter Cordes Jan 06 '21 at 21:43
While this is probably better, I am using a book and I wanted to make the program work on my system first before I try messing around with the code and trying to improve it. Anyways, I had no idea that mov-elimination exists, so even if I did try to make the code better, I'd probably not think of doing that, so thanks! – Amnon Tal Jan 07 '21 at 00:16

score 1 · Answer 2 · answered Jan 06 '21 at 15:05

1

Can you single-step through the code? In that case you should be able to find where it segfaults.

Another approach is to write the code in C and look at the disassembly listing. If the code works in C you can take the assembler code and optimize it to your liking. Have you tried that?

Also, have you seen this article? http://left404.com/2011/01/04/converting-x86-assembly-from-masm-to-nasm-3

answered Jan 06 '21 at 15:05

StureS

227
2
10

I tried to disassemble a C version of the code, but for some reason I couldn't find the function in the assembly code (I think gcc changes the name of stuff during compilation or something). About the article, I actually used it a lot when trying to convert the programs in the book into nasm code, but apparently my mistake was forgetting to change some things and treating the registers as 32 bit instead of 64 bit. Thanks anyways! – Amnon Tal Jan 06 '21 at 15:26
@AmnonTal: If you use `extern "C"`, compilers won't change the name. Or just compile it as C. And look at gcc or clang asm output, not disassembly, so you don't have to find it among other code. [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) – Peter Cordes Jan 06 '21 at 21:45

copying a value into the address of a pointer in an assembly function called from c++ (nasm)

2 Answers2