5

I would like to pass values from C program to Assembly using the linked assembly method instead of inline assembly method in C. Below is the Assembly program(GCD) which is am working on.

;gcdasm.nasm
bits 64
section .text
global gcdasm
gcdasm:
    push rbp
    mov rbp, rsp
    mov rax, [rbp+4]        ;load rax with x
    mov rbx, [rbp+8]        ;load rbx with y
top:
    cmp rax, rbx            ;x(rax) has to be larger than y(rbx)
    je exit                 ;if x=y then exit and return value y
    jb xchange              ;if x<y then swap x and y
modulo:
    cqo                     ;RDX:RAX sign extend
    div rbx                 ;div rdx:rax with rbx
    cmp rdx, 0              ;check remider if its 0
    je exit                 ;if reminder is 0 then exit return return y
    mov rax, rdx            ;reminder rdx as next dividend
    jmp modulo              ;loop 
xchange:
    xchg rax, rbx           ;swap x and y
    jmp modulo

exit:
    mov rax, rbx            ;Return c program with the divisor y
    mov rsp, rbp
    pop rbp
    ret

And this is the C program from with I am trying to pass the values to assembly program

//gcd.c
#include<stdio.h>

extern int gcdasm(int x, int y); 

int main(void){
    int x=0;
    int y=0;
    int result=0;

    x = 46; 
    y = 90; 
    printf("%d and %d have a gcd of %d\n", x,y,gcdasm(x,y));

    x = 55;
    y = 66;
    printf("%d and %d have a gcd of %d\n", x,y,gcdasm(x,y));

    return 0;
}

When I compile using the below method and run it. I get either error Floating point exception or an empty prompt waiting for input

$ nasm -felf64 gcdasm.nasm -o gcdasm.o
$ gcc gcdasm.o gcd.c -o gcd
$ ./gcd 
Floating point exception
$ ./gcd 

I am unable to figure out the error. Kindly help me out. Thank you.

bartop
  • 9,971
  • 1
  • 23
  • 54
Kanan Jarrus
  • 607
  • 1
  • 12
  • 26
  • 9
    64-bit calling convention passes the first 6 integer class parameters in registers (not on the stack).There is a summary of the 64-bit calling convention here: https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI . You can find the complete 64-bit ABI here: https://www.uclibc.org/docs/psABI-x86_64.pdf – Michael Petch Feb 21 '18 at 06:49
  • 2
    @MichaelPetch Thank you very much and also for the links. mov rax, rdi ;load rax with x mov rbx, rsi ;load rbx with y – Kanan Jarrus Feb 21 '18 at 07:21
  • @MichaelPetch, my fault. The instruction is correct. Didn't have enough coffee yet... – Paul Ogilvie Feb 21 '18 at 08:52
  • Apologies for saying `mov rsp, rbp` should be deleted. It is correct. – Paul Ogilvie Feb 21 '18 at 08:53

1 Answers1

4

Passing arguments to gcdasm()

The two int arguments are passed through registers, not the stack. The first and second arguments are passed in the lower-half of rdi and rsi (i.e.: edi and esi), respectively. So, by sign extending edi and esi into rax and rbx respectively, you load the passed arguments into those registers:

movsx rax, edi  ;load rax with x
movsx rbx, esi  ;load rbx with y

However, note that rbx is not a scratch register, therefore the callee needs to save it before modifying it and then restore it back before leaving the gcdasm function.

You can simply replace rbx by rcx (which isn't a callee-saved register) everywhere in your code. You don't need rbp at all, so you can remove all the instructions where rbp appears.


Other problems

  • There is also a problem with the logic of the program with:

    mov rax, rdx   ;reminder rdx as next dividend
    

    Instead of this, the divisor (rcx) should become the dividend (rax) and the remainder (rdx) should become the divisor (rcx), that is:

    mov rax, rcx
    mov rcx, rdx
    
  • When dividing signed values, you have to use the idiv instruction, not div.


Improvement

There are also some reasons regarding performance and code size to use test rdx, rdx instead of cmp rdx, 0 for comparing rdx against zero.


With all that above in mind:

;gcdasm.nasm
bits 64
section .text
global gcdasm
gcdasm:
    movsx rax, edi          ;load rax with x
    movsx rcx, esi          ;load rcx with y
top:
    cmp rax, rcx            ;x(rax) has to be larger than y(rcx)
    je exit                 ;if x=y then exit and return value y
    jb xchange              ;if x<y then swap x and y
modulo:
    cqo                     ;sign extend RDX:RAX
    idiv rcx                ;rdx:rax/rcx (signed values)
    test rdx, rdx           ;check whether remainder is zero
    je exit                 ;if reminder is 0 then exit return y
    mov rax, rcx            ;divisor becomes dividend
    mov rcx, rdx            ;remainder becomes divisor
    jmp modulo              ;loop 
xchange:
    xchg rax, rcx           ;swap x and y
    jmp modulo

exit:
    mov rax, rcx            ;Return c program with the divisor y
    ret
JFMR
  • 23,265
  • 4
  • 52
  • 76
  • 1
    The function signature takes `int` args; the caller is allowed to leave garbage in the high 32 bits of `rdi` and `rsi`, so it's incorrect as well as much slower to use 64-bit `div`. It's also incorrect to use `div` instead of `idiv` after `cqo`. (Related: see [GCD in 9 bytes of x86-64 machine code](https://codegolf.stackexchange.com/questions/77270/greatest-common-divisor/77364#77364), or 10 bytes for unsigned. 13 bytes for 64-bit operand-size. Optimized for code-size at the expense of performance, but with much simpler logic than this, and comments / explanation of why it works...) – Peter Cordes Feb 21 '18 at 16:29
  • @PeterCordes Many thanks for the review. It seems that it's working now for negative numbers as well. – JFMR Feb 21 '18 at 17:17
  • `movsxd` is one way to solve the problem, but *much* better would be to either change the prototype to `int64_t` or change all register names to `e*x` instead of `r*x`. Using 64-bit `idiv` on 32-bit operands is the worst of both worlds: larger code-size and [3x worse performance](https://stackoverflow.com/questions/40354978/why-is-this-c-code-faster-than-my-hand-written-assembly-for-testing-the-collat/40355466#40355466) for no benefit in terms of range of inputs accepted. – Peter Cordes Feb 21 '18 at 17:26
  • Also, then you could use `esi` instead of `ecx` everywhere; you still need to copy `edi` to `eax` because `idiv` uses implicit inputs, but choosing `ecx` instead of `ebx` was a missed opportunity to drop the `mov` altogether. – Peter Cordes Feb 21 '18 at 17:28
  • And BTW, you can use `imul` on signed values if you only want the low half of the result. e.g. `imul rcx, rdi, 12345` is the same binary operation regardless of whether you interpret it as signed or unsigned. You're thinking of the one-operand form which matches div/idiv, but your statement that you can't use `idiv` on signed inputs isn't quite true. In fact you *usually* don't need a full 64x64 -> 128b multiply, so 2 or 3 operand `imul` is the right choice because it's faster (1 uop) and all the operands are explicit (so it doesn't constrain register choice). – Peter Cordes Feb 21 '18 at 17:32