How to code factorial of an input with Recursivity in NASM x86

Question

%include "asm_io.inc"



segment .data

segment .bss



segment .text

    global asm_main

asm_main:


    enter 0,0
    pusha



    call    read_int

    push eax
    call fak_rekursiv
    add esp, 4



    call print_int
    call print_nl

    


    popa
    mov eax, 0
    leave
    ret

fak_rekursiv:

    enter 4, 0
    pusha

    

    mov eax, [ebp + 8]

    

    cmp eax, 0
    je ergebnis_1


    cmp eax, 1
    je ergebnis_1

    

    

    mov ebx, eax
    dec ebx
    mul ebx
    push ebx
    call fak_rekursiv
    pop ebx


            

    ergebnis:
        mov [ebp - 4], eax
        

    ergebnis_1:

        mov [ebp - 4], dword 1 

    popa
    mov eax, [ebp - 4]
    leave
    ret

I am learning to code on NASM and I was trying to understand recursion through using coding factorial but I got confused quickly.
How can I use Recursion in NASM to code factorial algorithm?

PIRIQITI · Answer 1 · 2023-01-14T14:51:00.733

2

actually I have coded up factorial just for fun earlier. And now I have dug through my files to find that code :) here you go, but it is more like a pseudo-code which I only stepped through debugger.

factorial:
    push    eax
    test    eax, 07FFFFFFEh
    jz      .exit
    dec     eax
    call    factorial
    mul     dword [esp]
.exit:
    pop     edx
    ret

put different values in eax and step through it in debugger.

edited Jan 14 '23 at 14:51

answered Jan 13 '23 at 21:38

PIRIQITI

149
7

1

Yeah that should work. It pushes the numbers from `n-1` down to `2` onto the stack (interleaved with return addresses), then multiplies them as it returns back up the call tree. It takes an arg in EAX and returns in EAX. (Unlike the code in the question which pushes an arg before the call; but it's in EAX anyway so it works either way.) BTW, "pseudo-code", not "sudo". – Peter Cordes Jan 14 '23 at 04:05
wow, that's such a concise code. It worked but I want to understand deeply what happened there. you pushed eax into the stack, why? what is the role of "test" command and what does 07FFFFFFEh do. how can we reach mul dword [esp] since I see only two scenarios either jump to exit or call factorial again why did you pop edx from the stack since when was it in the stack? I am sorry for asking a lot of question. I am just confused. it's been probably less than a week since I started learning assembly – Mouad Meziani Jan 14 '23 at 13:57
So, ```test``` checks if ```eax``` has reached ```1``` and if it did, it ends calling factorial. ```test``` is used there because it has a better performance compared to ```cmp``` which is fine to use in this code as well: ```cmp eax, 1``` will do the exact thing. Notice when we have ```1``` encountered, it is pushed on the stack, but then ```jz .exit``` jumps to ```.exit``` and ```pop edx``` pops last thing from the stack which turns out to be ```1```. With 1 gone from the stack we return to the next address after ```call factorial``` instruction in code and ```mul dword [esp]``` multiplies – PIRIQITI Jan 14 '23 at 14:43
multiplies the next thing on stack on ```eax``` so we get ```1 * 2``` then ```pop edx``` pops ```2``` from the stack, ```ret``` retreturns to ```mul``` again and so on. after everything is done last thing ```ret``` will see on the stack is the address of where it came from, from the main program or from the place you called ```factorial``` and returns to that location. let me know if anything else is still unclear, I'll happily explain :) – PIRIQITI Jan 14 '23 at 14:46
1

Those comments explaining it should be part of the actual answer, especially since it's lacking any explanation or comments to start with. BTW, `cmp eax, 2` / `jl .exit` should work just as well, with smaller code-size and otherwise equal efficiency. (Or `jb` if you want to allow large unsigned numbers.) Or am I missing something and there's some corner case where the `test eax,imm32` does something different? – Peter Cordes Jan 14 '23 at 18:16
1

How is this 'pseudo-code'? This is actual x86 code. – Sep Roland Jan 14 '23 at 19:58
@PeterCordes when done on larger numbers ```test``` turned out to work faster. I tested it on ubuntu with ```time``` command, but IDK, maybe processor cycles are similar for both of those. – PIRIQITI Jan 14 '23 at 22:14
`cmp eax,2`/jb` and `test eax, imm32` / `jnz` execute *identically* in the back-end of all x86 CPUs for decades. https://agner.org/optimize/. If there was a real effect, not just measurement error, it was only from the different in code-size (and therefore perhaps alignment of some later instruction, such as a branch). – Peter Cordes Jan 15 '23 at 05:03
If the top of the function starts at `align 16`, none of the `jcc` / `call` or `ret` could touch the bottom of a 32-byte boundary so [Skylake's JCC erratum](https://stackoverflow.com/questions/61256646/how-can-i-mitigate-the-impact-of-the-intel-jcc-erratum-on-gcc) is probably not the cause, unless you didn't align your code. – Peter Cordes Jan 15 '23 at 05:07
With `cmp eax,2`/`jb` the total size is 17 bytes, so the `ret` is at the start of a new 16-byte chunk. If your test happened to have the function start in the last 16 bytes of a cache line, that would put `ret` in a new I-cache line, so `pop` and `ret` couldn't fetch from the uop cache in the same cycle. But with `test`, the `pop` and `ret` both start later than 16 bytes into the function. – Peter Cordes Jan 15 '23 at 05:08
I assume your test harness called it multiple times with a large input, to amortize the cost of page-faulting and growing the stack to make room for those saved EAX values and return addresses... Anyway, for a general answer with no CPU specified, longer less-obvious machine code that happens to be faster in one specific case on your CPU isn't a good choice for an SO answer. It's worth *mentioning* as an alternative to the standard `cmp`/`jb` method, if you want to discuss your performance experiment, but otherwise shorter machine code with equal-cost instructions is in general better. – Peter Cordes Jan 15 '23 at 05:11

vengy · Answer 2 · 2023-01-17T02:12:30.637

1

Not too familiar with NASM, but here's a MASM solution which calls the factorial function recursively.

factorial.asm

.386
.model flat, stdcall
option casemap :none

includelib \masm32\lib\msvcrt.lib
printf PROTO C, :VARARG

.data
fmt db "%u! = %u", 13, 10, 0  

.code

factorial proc
    cmp     eax, 0   ; Special-case to account for 0! = 1
    je      retone
    cmp     eax, 1
    je      retnow
    push    eax
    dec     eax
    call    factorial
    pop     edx
    imul    eax, edx ; eax = eax * edx 
    
retnow:
    ret
    
retone:
    mov     eax, 1
    ret    
factorial endp

main:   

    ; Loop from 0 to 12 (max) factorial for 32-bit code.
    mov     ebx, 0  
num:
    mov     eax, ebx
    call    factorial
    invoke  printf, OFFSET fmt, ebx, eax
    inc     ebx
    cmp     ebx,13   
    jne     num
    
    ret
end main

Displays all factorials from 0 to 12

edited Jan 17 '23 at 02:12

answered Jan 16 '23 at 18:17

vengy

1,548
10
18

In normal calling conventions, EBX is call-preserved. Use ECX or EDX (the other two call-clobbered registers) for `pop ecx` / `imul eax, ecx`. Or if the caller wants to use the full 64-bit EDX:EAX result, then sure use one-operand `mul` to allow an output range of `n * 2^32` since you do that multiplication last. (If any of the earlier multiplies overflow 32 bits, the result will be truncated.) – Peter Cordes Jan 17 '23 at 00:46
1

`cmp eax, 1`/`je` will loop 2^32 iterations for an input of `0`. You might as well do `jb` to also return in that case, with EAX=0. (Mathematically, `0!` is normally considered to be `1`, so you'd need to special-case the base-case return path to always return `1` instead of `n` if you wanted to handle that. Using `jb` would make it actually return `0` instead of overflowing the stack by trying to push 32GiB of stuff (4G 4-byte EAXes and return values). Or of course use `jl` to treat the input as signed and return it instead of crashing. – Peter Cordes Jan 17 '23 at 00:51
OTOH, crashing with a stack overflow may be better than a "wrong" answer when passed a bad arg, so `je` could be seen as an intentional `assert` that values are small positive. At least if there was a comment to that effect. :P – Peter Cordes Jan 17 '23 at 00:52
The comment on `mul edx ; eax = eax * edx` isn't fully accurate. It actually does `EDX:EAX = eax * edx`. If you want to only set EAX, use the more efficient instruction, `imul eax, edx`. (only 1 uop instead of 3 on Intel CPUs; https://uops.info/, https://agner.org/optimize/). Or if you do keep using `mul`, you could adjust the caller to use `%llu` for the factorial and push EDX then EAX. So in the `invoke` list, `..., eax ,edx` – Peter Cordes Jan 17 '23 at 02:03

How to code factorial of an input with Recursivity in NASM x86

2 Answers2