Segmentation fault (core dumped) - C calling Assembly code for prime testing

Question

I have an assembly language code I'm calling from C but I keep getting error: Segmentation fault (core dumped) I don't know what's the cause.

; this is the assembly code
section .text

global isPrime

isPrime: mov ecx, 2
.test:   cmp ecx, [rsi]
         jle .doif
         jg .prime

.doif:   mov eax, [rdi]
         mov edx, 0
         mov ebx, [rsi]
         div ebx
         cmp edx, 0
         jle .notPrime
         jg .checkAgain
.notPrime:
         mov eax, 1
         ret

.checkAgain:
         inc ecx
         jmp .test

.prime: mov eax, 0
         ret

The C code:

// C code

#include <stdio.h>

extern int isPrime(int *number, int *mValue);

int main() {
    int limit, m, input = 0;
    printf("Enter the limit of the prime numbers:");
    input = scanf("%d", &limit);
    while (input != 1) {
        printf("Not a number!\n");
        scanf("%*[^\n]");
        printf("Enter the limit of the prime numbers:");
        input = scanf("%d", &limit);
    }
    for (int i = 2; i <= limit; ++i) {
        m = i / 2;
        int flag = isPrime(&i, &m); //this is what I'm trying to implement
        // for (int j = 2; j <= m; j++) {
        //     if (i % j == 0) {
        //         printf("Number %d is not prime\n", i);
        //         flag = 1;
        //         break;
        //     }
        // }
        printf("%d\n", flag);
        if (flag == 0)
            printf("Number %d is prime\n", i);
    }
  return 0;
}

Error:

Enter the limit of the prime numbers:10
0
0
Segmentation fault (core dumped)

the commented part in the C code is what I want to write in assembly but got the error I mentioned above. From my research, I'm trying to write a memory address I do not have access to. The error is from assembly code but I don't know where exactly, please any possible solutions?

Besides the calling-convention problem, `isprime` could be a *lot* more efficient. After checking if it's greater than 2 and odd (`test byte [rdi], 1`), you can increment by 2 so you only try odd divisors starting with 3. You only need to check up to sqrt(n), i.e. when `n / trial_divisor <= trial_divisor`. Also, it makes no sense to have the caller compute `n/2` or `sqrt(2)` and pass it as another arg; that's something `isprime` should be doing itself, like `mov` / `shr`, or with the divisor >= quotient trick. — Peter Cordes, Nov 13 '22 at 05:57
See [Checking if a number is prime in NASM Win64 Assembly](https://codereview.stackexchange.com/a/204965) on codereview. Also, if you have a conditional branch inside the loop, normally more efficient to lay it out so it's not-taken in the normal case. i.e. you should lay it out like `if(remainder == 0) return 0`, so `test edx,edx` / `jz .prime`. Also, it's called "isprime", but it returns 0 (false) for prime numbers, true for composite? — Peter Cordes, Nov 13 '22 at 06:00

score 3 · Answer 1 · edited Nov 14 '22 at 02:15

System V x86_64 calling convention requires registers rbx, rsp, rbp, r12, r13, r14, r15 to be preserved during function call. You use ebx register as a part of rbx but don't preserve it.

Straightforward solution is to preserve it on stack:

bits 64
section .text

global isPrime

isPrime: push rbx
         mov ecx, 2
.test:   cmp ecx, [rsi]
         jle .doif
         jg .prime

.doif:   mov eax, [rdi]
         mov edx, 0
         mov ebx, [rsi]
         div ebx
         cmp edx, 0
         jle .notPrime
         jg .checkAgain
.notPrime:
         pop rbx
         mov eax, 1
         ret

.checkAgain:
         inc ecx
         jmp .test

.prime:  pop rbx
         mov eax, 0
         ret

Note: there are two return points, so you have to restore rbx before each return.

It is also possible to save register before modification for each iteration. It saves one instruction of code size at the cost of executing push/pop every time through the loop. div is slow, but this might be even slower on some modern CPUs. (Example how not to do):

...
.doif:   push rbx       ; don't do this, save/restore around the whole function
         mov eax, [rdi]
         mov edx, 0
         mov ebx, [rsi]
         div ebx
         pop rbx
...

Better solution is to use scratch register which is allowed to be modified instead of ebx.

         mov r8d, [rsi]
         div r8d

Or even divide by value in memory:

        div dword [rsi]

Some another improvements:

For performance reason you can load values into registers at the beginning.
jle instruction is redundant: CPU continue to execute correct branch if jump if greater is not executed.
cdq instruction may be used to extend value in eax into edx:eax. Note: it behaves differently from mov edx,0 because it extends sign bit. However we receive pointers to signed integers, so in theory we should be using cdq and idiv, not zero-extend and div. (You don't want cdq + div; if your input was negative, div will treat it as huge unsigned, and fault when the quotient doesn't fit in 32 bits.)
For equality to zero test edx,edx may be used instead of cmp edx,0, these instructions are one byte shorter and equal or faster. Then jz/jnz conditional jump is more appropriate for semantic meaning, but the same machine instruction as je/jne.
A similar x86 peephole optimization is zeroing a register with xor eax, eax
You can increment ecx unconditionally. It saves one jump instruction.

But there is logic error in your original code: all divisions are by the same value. For prime test you should divide by all values sequentially. (Or just by odd values, after checking once for an even number, so you can increment your counter by 2.)

         div ecx

So final code is:

section .text

global isPrime

isPrime: mov edi, [rdi]      ; load the pointed-to integers
         mov esi, [rsi]      ;  better: pass by value in the first place
         mov ecx, 2          ; starting divisor, and anything signed-less-than this is assumed prime.

                           ; do {
.test:   cmp ecx, esi
         jg .prime           ; if(ecx > endpoint) return yes

         mov eax, edi
         xor edx, edx        ; zero-extend EAX into EDX:EAX, or use cdq/idiv for signed division
         div ecx
         inc ecx
         test edx, edx
         jnz .test         ; }while( n % ecx == 0)
     ;; composite if we left the loop this way
         mov eax, 1
         ret

.prime:  xor eax, eax      ; eax=0
         ret

You declared your C variables as signed int, but you're using unsigned div in asm. Unsigned makes the most sense; normally people don't talk about negative primes.

Passing a 2nd arg in ESI/RSI doesn't make much sense; the upper bound divisor limit is part of the prime-testing algorithm, not something the caller should need to calculate for it. mov esi, edi ; shr esi, 1.

You can actually stop much sooner for large numbers, at sqrt(n) instead of n/2. Or when n/divisor <= divisor, so you don't actually have to compute a square root. See a code-review answer for a loop that does that, and increments by 2.

The `jle` can be eliminated because they jump to the "fall throughs" — Craig Estey, Nov 13 '22 at 03:00
@CraigEstey there are some imperfections in original code but i tried to keep the example close to original as much as possible. — dimich, Nov 13 '22 at 03:05
Better yet: Instead of using `rbx`, use a register than _can_ be trashed by the callee: (e.g.) `r9` — Craig Estey, Nov 13 '22 at 03:05
@CraigEstey: Or since the code doesn't need to keep reloading `[rsi]` from memory, just `mov esi, [rsi]` and use `cmp ecx, esi` and `div esi`. Or better, have the caller pass the numbers by value, not reference. Or if you don't want to do that, `div dword [rsi]` is a drop-in replacement. (dimich: The current edit to use R11 to save RBX is not what Craig meant; if you insist on reloading that value twice per loop, `mov r11d, [rsi]` / `div r11d`. And if you were going to save/restore RBX, you'd do it outside the loop.) — Peter Cordes, Nov 13 '22 at 05:51
@PeterCordes I made that edit before the comment appeared. I also thought about using another register for divisor but it is out of scope of OP question. There are many things to impove in the code. — dimich, Nov 13 '22 at 05:59
Yeah, there are lots of things to improve. Even more after your proposed changes, unfortunately: you put the push/pop inside the loop, instead of around the whole function. And instead of just using a different register, you made the loop code even more clunky. You achieved correctness, but *not* in a way I'd recommend anyone learn from. (See my codereview answer [Checking if a number is prime in NASM Win64 Assembly](//codereview.stackexchange.com/a/204965) for code hat sucks as little as possible, given that it's doing trial division. But that's a whole `main`, not separate `isprime`) — Peter Cordes, Nov 13 '22 at 06:03
In that codereview answer, look at the code starting from `mov r8d, [rsp+32]` which loads a scanf result. It doesn't touch any call-preserved registers the whole function. (And for Windows x64, that means avoiding ESI / EDI, unlike in x86-64 SysV. We have more low / legacy registers to play with here.) — Peter Cordes, Nov 13 '22 at 06:11
@PeterCordes Just using different register implies changing operand size. I don't know OP's requirements and why they decided to pass pointers instead of values. The question is about reason of misbehavior, not request for optimization. If i had such task, i would not use assembly at all. — dimich, Nov 13 '22 at 06:19
No it doesn't. `r11d` and `r8d` are 32-bit registers, exact drop-in replacements for `ebx`. Just like `r11` is a drop-in replacement for `rbx`. https://wiki.osdev.org/CPU_Registers_x86-64. And `div dword [rsi]` is also a drop-in replacement for those few lines, if you don't want to hoist a load out of the loop. — Peter Cordes, Nov 13 '22 at 06:20
I figured out I don't need the `ebx/rbx` registers to get the prime. Thank so much @PeterCordes for the links and @dimich for the code; and everyone for the comments. — Bille Ibinabo, Nov 13 '22 at 10:11
@dimich: Oh BTW, your first code block is actually fine; you *do* have a version that pushes/pops around the whole function like how call-preserved registers should be used for efficiency; I missed that when writing an earlier comment. Only the last 2 code blocks are bad suggestions. — Peter Cordes, Nov 13 '22 at 12:08
@PeterCordes Yep, it flew out of my head that r8-r15 have 32-bit parts :) Updated answer. — dimich, Nov 14 '22 at 01:50
That's better. Added a few updates myself, e.g. don't `or edx,edx`, and use either `cdq`;`idiv` or `xor edx,edx` / `div`; don't mix and match unless you have a signed dividend and unsigned divisor. Since your division is unreachable for negative numbers, best to use an efficient zeroing instruction. And just for simplicity and consistency; we're not code-golfing for minimal machine-code bytes at the expense of beginners being able to learn general good practices from it. — Peter Cordes, Nov 14 '22 at 02:19

score 1 · Answer 2 · answered Nov 13 '22 at 10:15

Just in case anyone is facing similar issue and needs help:

section .text
global isPrime

isPrime: mov ecx, 2
.test:   cmp ecx, [rsi]
         jle .doif
         jg .prime

.doif:   mov eax, [rdi]
         mov edx, 0
         div ecx
         cmp edx, 0
         jg .checkAgain
         mov eax, 1
         ret

.checkAgain:
         inc ecx
         jmp .test

.prime:  mov eax, 0
         ret

Segmentation fault (core dumped) - C calling Assembly code for prime testing

2 Answers2