0

I am trying to understand FPU, and I am pretty confused. The problem is that as I understand from here, FPU has its own stack. But for example in this code (NASM):

global _main

extern _printf

section .data
    hellomessage db `Hello World!\n`, 10, 0
    numone dd 1.2
    digitsign db '%f', 0xA, 0

section .text
_main:
    ;Greet the user
    push hellomessage
    call _printf
    add esp,4

    sub esp, 8

    fld dword[numone]
    fstp qword[esp]

    push digitsign
    call _printf
    add esp, 12
    ret

I have to have the sub esp, 8 line to "make space" for a double, otherwise the program crashes. But by doing this, I change the pointer of the "regular stack", which does not make sense with my idea of two separate stacks.

I am certain that I do not understand something, but I do not know what this is.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 3
    Because you used `fstp qword [esp]` which transfers from FPU stack to CPU stack. Hence you need space on CPU stack. – Jester Apr 15 '18 at 23:54

2 Answers2

6

x87 loads/stores use the same memory addresses that everything else does. The x87 stack is registers st0..st7, not memory at all.

See SIMPLY FPU: Chap. 1 Description of FPU Internals for details on the x87 register stack.

fstp qword[esp] stores 8 bytes to the regular call stack, like mov [esp], eax / mov [esp+4], edx would do. Addressing modes don't change meaning when used with x87 load/store instructions! i.e. your process only has one address space.


So if you remove the sub esp, 8, your fstp would overwrite your return address.

Then at the end of the function, add esp, 12 would leave esp pointing 8 bytes above that, so ret will pop some garbage into EIP and then you segfault when trying to fetch code from that bad address, or the bytes there decode to instructions which segfault.

Above main's return address, you'll find argc and then char **argv. It's a pointer to an array of pointers, so using it as a return address will mean you execute pointer values as code. (If I got this right.)

Use a debugger to see what happens to registers and memory as you single step.


Note that add esp,4 / sub esp, 8 is a bit silly. add esp, +4 - 8 (i.e. add esp, -4) would be a self-documenting way to do that with one instruction.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • If using the `char **argv` pointer as a return address is likely to SEGV since that memory is not going to be marked as executable (at least, it shouldn't in any modern install). – Alexis Wilke Mar 20 '23 at 20:42
  • 1
    @AlexisWilke: Yes, true. Although the OP's program doesn't have a `section .note.GNU-stack noalloc noexec nowrite progbits` ([Unexpected exec permission from mmap when assembly files included in the project](https://stackoverflow.com/q/58260465)) so the toolchain defaults will unfortunately link it with an executable stack at least (modern Linux kernel) or READ_IMPLIES_EXEC (older kernel before about 5.9), if this is Linux at all. But yeah, that's another thing to look for when using a debugger to see why something crashes. – Peter Cordes Mar 20 '23 at 21:02
3

The FPU has a "register stack" (and not a stack in RAM).

Essentially; there are 8 registers (let's call them FPU_R0, FPU_R1, ..., FPU_R7), and 8 names (let's say they're st0, st1, ..., st7), and there's a "top of FPU stack" value that determines which name is used for which register.

You can push new values onto the FPU register stack. For example:

    fld qword [A]     ;st0 = FPU_R7 = A
    fld qword [B]     ;st0 = FPU_R6 = B, st1 = FPU_R7 = A
    fld qword [C]     ;st0 = FPU_R5 = C, st1 = FPU_R6 = B, st2 = FPU_R7 = A

You can pop values from the FPU register stack. For example:

                      ;st0 = FPU_R5 = C, st1 = FPU_R6 = B, st2 = FPU_R7 = A
    fstp qword [C]    ;st0 = FPU_R6 = B, st1 = FPU_R7 = A
    fstp qword [B]    ;st0 = FPU_R7 = A
    fstp qword [A]
Ped7g
  • 16,236
  • 3
  • 26
  • 63
Brendan
  • 35,656
  • 2
  • 39
  • 66
  • 2
    I don't think this answer deserves downvotes. The first sentence is the answer to the OP's confusion, the rest is demonstrating how the x87 stack works (and that it's a register stack, nothing to do with the call stack in memory.) It's not a great answer, which is why I wrote my own; it barely answers the question. But everything it says is actually correct. (err, except that st0 is repeated in a couple of the comments. I think you have a copy/paste + editting error and forgot to increment your var names). It'd also be good to show `fmul st3` or something. – Peter Cordes Apr 16 '18 at 02:52
  • 1
    This answer is essential *part* of the full answer, who downvoted this and why? It doesn't explain anything about regular stack from OPs question, but explains the FPU "stack" part correctly, which itself should be technically enough to clear OPs confusion and fix their problem. (about quality of this answer, maybe even showing difference between `fst` and `fstp` would make some sense) (edit2: nor do I get downvote of OP, it's a bit low effort as it's probably confused after reading about FPU *first* time, but the question is reasonably clear to show what the confusion is, and what was tried) – Ped7g Apr 16 '18 at 07:08
  • This answer certainly it **does not** deserve down-votes. Since though it's not that complete as the one from PeterCordes, I will just up-vote it. Nevertheless, I thank you VERY much @Brendan for your time and effort on this one :) –  Apr 16 '18 at 09:51
  • 1
    The question was why the `sub esp` is needed, not how the FPU stack works. This answer doesn't provide any insight into that. But it's not wrong, per se. – Jester Apr 16 '18 at 10:16