Keep getting NaN after doing some calculations in assembly

Question

I'm working on a program that solves for the roots of a quadratic equation. I was able to get the first root in the root1 subroutine. However, when I try to solve for the second root in root2, the "/2a" part of the quadratic formula keeps yielding NaN.

Here's the code:

INCLUDE Irvine32.inc    
INCLUDE macros.inc

.data                                                   
a real8 ?
b real8 ?
cc real8 ?

a2 real8 ?
b2 real8 ?
cc2 real8 ?

two real8 2.0
four real8 4.0

two2 real8 2.0
four2 real8 4.0

ten real8 1000.0
num real8 10.0


.code                                   
main PROC       
    finit 
    mWrite "Enter coefficient (a): "
    call ReadFloat
    fst a
    fstp a2

    mWrite "Enter coefficient (b): "
    call ReadFloat
    fst b
    fstp b2

    mWrite "Enter coefficient (c): "
    call ReadFloat
    fst cc
    fstp cc2
    
    mWrite "Roots: "
    call root1
    call Crlf
    call root2

    ;call showfpustack
exit
main ENDP

root1 PROC
    
    ; b^2
    fld b
    fmul b
    fchs    ; flip sign
    fst b
    
    ; 4 * a * c
    fld four
    fmul a
    fmul cc
    fchs
    fsub b 
    fsqrt
    fst four


    fld b
    fchs
    fsqrt
    

    fchs
    fadd four
    fst b

    fld two
    fmul a
    fst two


    fld b
    fdiv two

    call WriteFloat
    call showfpustack

    ret
root1 endp

root2 PROC

    fld b2
    fmul b2
    fchs
    fst b2


    fld four2
    fmul a2
    fmul cc2
    fchs
    fsub b2
    fsqrt
    fst four2


    fld b2
    fchs
    fsqrt


    fchs
    fsub four2
    fst b2

    call Crlf
    call WriteFloat

    fld two2
    fmul a2
    fst two2

    fld b2
    fdiv two2

    call showfpustack
    ret
root2 endp
end main

I was able to verify the results of previous calculations. It's just this part that I'm having trouble with.

Instead of `fmul` by 2, you can just `fadd st0` to double the current top-of-stack value. But anyway, when you single-step in a debugger, where does NaN appear? Are you taking the sqrt of a negative number? Or are you overflowing the stack with all those `fld` stack pushes and no pops? Also, why would you overwrite your constant `4.0` with a fsqrt result? That's weird, and means you can't call your function again. — Peter Cordes, Nov 19 '20 at 06:41
I'm still getting NaN. st(0) is missing when I call showFPUstack. — Nico, Nov 19 '20 at 06:48
But not from `WriteFloat`? Does `WriteFloat` modify FP registers? Use a debugger to find out, instead of wasting your time only calling printing functions. Or do you mean in root2, which only uses `showfpustack`? — Peter Cordes, Nov 19 '20 at 06:52
st(0) becomes 1#IND when I load a2 onto it. Any idea why that happens? — Nico, Nov 19 '20 at 07:01
You mean with `fmul a2`? That doesn't just load, it does `st0 *= a2`. But anyway, what were the old values of `st0` and `a2` at that point? Are you sure it didn't go to NaN at `fld two2`? That would be expected if all eight `st0` .. `st7` x87 stack registers were already in use before the load; like I said because you never pop anything. — Peter Cordes, Nov 19 '20 at 07:13
You could also make much better use of the x87 stack instead of storing to memory and reloading all the time, and stuff like `fdivr b2` to do `st0 = b2 / st0` instead of that store / load / fdiv. Or you could `fld b2` / `fxchg` / `fdivp` or something, but that's obviously worse. Like I said, you don't need to overwrite your constants, that's just confusing for anyone reading your code. — Peter Cordes, Nov 19 '20 at 07:15

Sep Roland · Accepted Answer · 2020-11-19T19:47:54.730

Instead of speculating about why you get a NAN, I'll walk you through the code that you need to write in order to calculate the roots of the quadratic equation.
You see that you don't need any constants in memory nor do you need multiple copies of the inputs a, b, and c.

Reset the FPU environment

fninit

Store an interesting constant that we will re-use muliple times:

fld     a             ; (st0) a
fadd    st0           ; (st0) a + a == 2a

Calculating D = b^2 - 4ac

fld     b             ; (st0) b                 (st1) 2a
fmul    st0           ; (st0) b * b == b^2      (st1) 2a

fld     c             ; (st0) c                 (st1) b^2  (st2) 2a
fmul    st2           ; (st0) c * 2a == 2ac     (st1) b^2  (st2) 2a
fadd    st0           ; (st0) 2ac + 2ac == 4ac  (st1) b^2  (st2) 2a

fsubp                 ; (st0) b^2 - 4ac == D    (st1) 2a

Here we need to test if D is not negative because no roots exist for D<0.

ftst                  ; This compares st0 to 0.0 and sets flags C3, C2, and C0
fnstsw  ax            ; Copies those flags to AX
sahf                  ; Copies those flags to EFLAGS
jp      IsUnordered   ; This should not happen
jc      IsNegative    ; This is very possible

We can take the square root only if D is not negative.

fsqrt                 ; (st0) sqrt(D)                   (st1) 2a

Going for the 1st root R1:

fld     b             ; (st0) b                         (st1) sqrt(D)  (st2) 2a
fchs                  ; (st0) -b                        (st1) sqrt(D)  (st2) 2a
fsub    st1           ; (st0) -b - sqrt(D)              (st1) sqrt(D)  (st2) 2a
fdiv    st2           ; (st0) (-b - sqrt(D)) / 2a == R1 (st1) sqrt(D)  (st2) 2a

Going for the 2nd root R2:

fld     b             ; (st0) b                         (st1) R1  (st2) sqrt(D)  (st3) 2a
fchs                  ; (st0) -b                        (st1) R1  (st2) sqrt(D)  (st3) 2a
fadd    st2           ; (st0) -b - sqrt(D)              (st1) R1  (st2) sqrt(D)  (st3) 2a
fdiv    st3           ; (st0) (-b - sqrt(D)) / 2a == R2 (st1) R1  (st2) sqrt(D)  (st3) 2a

At this point the FPU register stack has only 4 entries:

st3 = 2a
st2 = sqrt(D)
st1 = R1
st0 = R2

Please note that nowhere in this code did I have to store anything back in memory. Working with the FPU is a matter of careful planning. Re-arranging an expression is often advantageous and re-using previously calculated values is always a win.

You don't need `fninit` if your other code isn't buggy, so the x87 stack is already empty. But maybe if you don't trust the CRT code to not have gimped the precision setting, you'd want to reset it? `fninit` at the start of a calculation is usually a sign of bad code that doesn't balance the x87 stack. — Peter Cordes, Nov 19 '20 at 18:52
It might be good to show how to use comments to keep track of what's in the FP stack at every step, since you're trying to make the point that this is what you should do instead of storing and reloading. Noting the x87 stack contents in comments with each instruction is how you record that planning, and how you figure out which `st` register to reference. — Peter Cordes, Nov 19 '20 at 19:00
`fld b` / `fchs` / `fadd st1` - couldn't this simply be `fsubr b` to do `st0 -= b`? Same for the first root, `fchs` / `fsubr`. Oh, I see you're doing both roots in the same function. So you could `fld b` / `fchs` / `fld st0` to push a copy of the negated `b`. — Peter Cordes, Nov 19 '20 at 19:01
@PeterCordes I was under the impression that my comments already showed what was in which FPU register. And for this short calculation I think it could suffice. The use of the *reverse* instructions is always a bit tricky and so I didn't use those. — Sep Roland, Nov 19 '20 at 19:14
You're showing what's in `st0`. I've sometimes written every comment as a `,` or `;`-separated list of the full x87 stack contents at every step, which I think can be helpful to help a beginner visualize the stack operation. e.g. in [Cube root on x87 FPU using Newton-Raphson method](https://stackoverflow.com/a/36925678). It can easily get noisy :/. But the OP seems to lack an understanding of the x87 stack, given the lack of pops and the insane store/reloads instead of fxch or fsubr or w/e, probably leading to NaN from loading into an already-full register. So I think that might help. — Peter Cordes, Nov 19 '20 at 19:18
Nice edit, comments work nicely here. Perhaps good to show using `ffree st(2)` and `ffree st(3)` to clean up the x87 stack. Or `fstp` the results, then `fstp st(0)` twice. Or `fcompp` to double-pop. Cleaning the x87 stack is an important part of writing a function that doesn't need later functions to use `fninit`. — Peter Cordes, Nov 19 '20 at 19:53

Keep getting NaN after doing some calculations in assembly

1 Answers1