0

I've written an x86 implementation of the Collatz Conjecture that takes an int and an amount of runs and then reduces the int to one using the Collatz conjecture and returns the number of iterations it took to do so, but I'm receiving a weird segmentation fault with no clear source when I try and run it using my cpp testing file

My x86 code is

global threexplusone
section .text
threexplusone:

    push rbp        ; load arg register
    mov rbp, rsp        ; move stack pointer to arg

    mov rax, [rbp+8]    ; input arg to rax
    mov rcx, rax        ; save arg in rcx
    cmp rax, 1      ; x == 1 ? Base Case
    je baseCase     

    xor rdx, rdx
    mov rbx, 2
    idiv rbx
    cmp rdx, 0      ; rax % 2 == 0 ? recurse : odd
    je recurse
    jmp odd


odd:
    mov rax, rcx        ; restores arg
    lea rax, [3*rax+1]  ; collatz math
    jmp recurse

recurse:
    push rax
    call threexplusone
    add rax, 1      ; numRuns++
    jmp end

baseCase:
    mov rax, 0      ; save rax to prevent segfault

end:
    mov rsp, rbp
    pop rbp
    ret

and my cpp testing is:

#include <iostream>
#include "timer.h"
#include <cstdlib>
#include <string>

using namespace std;

extern "C" int threexplusone(int);

int main() {
    int x, n, firstRun;

    timer t1;

    cout<<"Enter the number to test collatify"<<endl;
    cin>>x;

    cout<<"Enter the number of runs"<<endl;
    cin>>n;

    t1.start();

    for(int i=0;i<n;i++) {
        threexplusone(x);
    }
    t1.stop();

    firstRun=threexplusone(x);

    double mean = (t1.getTime()*1000)/n;

    cout<<"Number of iterations: "<<firstRun<<endl;
    cout<<"Mean runtime: "<<mean<<"ms"<<endl;
    return 0;
}

I know for sure that the implementation of the timer works just fine, but I'm confused as to what could possibly be causing the segmentation fault here. I've tried some things, like xoring some of the variables before using them or using imul instead of lea, but nothing has changed the issue thusfar.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Xadash
  • 1
  • 2
  • 1
    `mov rax, [rbp+8]` Nope. Neither of the 2 calling conventions used on x86-64 pass args on the stack. [Why does Windows64 use a different calling convention from all other OSes on x86-64?](https://stackoverflow.com/q/4429398). Also, an `int` arg is only 32 bits, so it doesn't make sense to load 64. – Peter Cordes Nov 03 '20 at 06:34
  • 1
    Also, you don't want `idiv` for division by 2. [Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?](https://stackoverflow.com/q/40354978) – Peter Cordes Nov 03 '20 at 06:36
  • Would replacing mov rbx, 2 idiv rbx with shr rbx, 1 be a better alterenative? – Xadash Nov 03 '20 at 06:42
  • No, because your value isn't in RBX at that point. But yes you want SHR by 1. (Also note that RBX is a call-preserved register. The C++ compiler expects you to leave it with the same value you found in it, like with RBP. Easiest way is to not touch it at all; there are several other call-clobbered registers you can use without saving/restoring.) – Peter Cordes Nov 03 '20 at 06:43
  • So I changed rbx to rsi and altered the way I divided by 2 to using shr, 1, then moving the value back to rsi, and I was wondering if there were further ways I could optimize the runtime of my code? One requirement, at least for now, is that I need to run it recursively, but that's the only constraint. – Xadash Nov 03 '20 at 06:52

0 Answers0