1

We had an assignment where we had to write the collatz conjecture in 64bit nasm assembly with only 13 commands or less (RET included). Now we are wondering how much you can actually reduce it. We are currently on 9.
Heres the collatz conjecture in pseudo code for reference:

enter image description here

Heres the code we have so far. A few notes:
A tutor of us said we can remove the XOR rax, rax because of some calling convention it's already zero. It doesn't work on my computer though so I've included it here.
I'm aware the two LEA's are probably the most obvious thing to reduce, but we can't think of a way since *6 seems to be the only thing thats literally impossible to do with LEA.

GLOBAL collatz
SECTION .text

collatz:
    XOR rax, rax

    .while:
        SHR rdi, 1
        JNC .even
            LEA rdi, [rdi*2+1]
            LEA rdi, [rdi*2+rdi+1]
        .even:

        INC rax

        CMP rdi, 1
        JA .while
    RET
nn3112337
  • 459
  • 6
  • 21
  • Don't spam tags! And what's your **specific** problem? – too honest for this site Feb 10 '17 at 00:01
  • 2
    I'm voting to close this question as off-topic because this is really more of a "code golf" question. – David Hoelzer Feb 10 '17 at 00:04
  • @DavidHoelzer ah I didnt realize it wasnt allowed since its technicaly solvable. Where do you recommend asking this? – nn3112337 Feb 10 '17 at 00:06
  • I'm not saying it's not allowed... I'm simply voting to close it since, in my opinion, it's off topic. :) There are some stacks that are geared more toward code review and toward "code golf" questions. This is more of a, "I can't figure out why this code doesn't work" stack. :) – David Hoelzer Feb 10 '17 at 00:07
  • Be careful converting a `while` loop into a `do-while`. Also, `LEA` can do *5, but you actually need *6 and that's the problem :) No, you can't remove the `xor rax`. PS: note that x86 instructions are variable length, so even less instructions might be more bytes. – Jester Feb 10 '17 at 00:08
  • works for this case tho. Yeah sorry thats what I meant, I'll edit it (and post it on the code golf stack, theres actually one just for that lol) – nn3112337 Feb 10 '17 at 00:09
  • 2
    [We have previously analyzed the Collatz conjecture in *great* detail](http://stackoverflow.com/questions/40354978/why-is-this-c-code-faster-than-my-hand-written-assembly-for-testing-the-collat). – Cody Gray - on strike Feb 10 '17 at 11:36
  • 1
    *"A tutor of us said we can remove the XOR rax, rax because of some calling convention it's already zero."* No, that's totally wrong. Under no calling convention is `RAX` guaranteed to be 0. – Cody Gray - on strike Feb 10 '17 at 11:45
  • @CodyGray Interesting. Funny how I came up with the exact same solution used there. – fuz Feb 10 '17 at 11:49
  • @CodyGray Using the amd64 SysV ABi, `al` contains the number of SSE registers used for arguments if no prototype is is present or the function prototype contains a variable argument list. Since this function takes no floating point arguments, `al` will be zero on entry. Source: @larkey, who made this exercise, told me. – fuz Dec 04 '17 at 12:16

1 Answers1

2

This is somewhat shorter:

collatz:
        or     $-1,%eax
.loop:  inc    %eax
        lea    1(%rdi,%rdi,2),%rsi
        shr    %rdi
        cmovc  %rsi,%rdi
        jnz    .loop
        ret

or, in nasm syntax:

collatz:
        or     eax,-1
.loop:  inc    eax
        lea    rsi,[rdi+rdi*2+1]
        shr    rdi
        cmovc  rdi,rsi
        jnz    .loop
        ret

To understand this code, pay close attention to the carry flag (CF) and zero flag (ZF).

fuz
  • 88,405
  • 25
  • 200
  • 352
  • I believe `rax` needs to be incremented always, so the `adc` is wrong. – Jester Feb 10 '17 at 01:03
  • Looks okay now. – Jester Feb 10 '17 at 01:38
  • Wow thats really smart. Overnight we had something similiar with adc, but this is great, thank you. – nn3112337 Feb 10 '17 at 11:41
  • Note that `or eax, -1` has a false data dependency on `eax`, so while this is a *short* way to set a register to -1, it is not a *fast* way. If you want speed, and are willing to give up a couple of bytes, you would use `mov eax, -1`. (`xor eax, eax` + `dec eax` is another option that avoids the false data dependency and is still shorter than `mov`.) Probably not especially relevant either way, though, since it's outside of the loop. – Cody Gray - on strike Feb 10 '17 at 11:47
  • @CodyGray An even shorter way is probably `push $-1; pop %rax` since that's only three bytes. I was using the `or` since we are optimizing for size, not speed. – fuz Feb 10 '17 at 11:48
  • @CodyGray yeah speed isnt the issue, code golfing is. altough it seems the loop itself cant be reduced under 5 operations. btw thanks for your comprehensive link above, really interesting read – nn3112337 Feb 10 '17 at 11:48