0

For example, x86 ISA has some specific instructions to handle stack-related operations such as push and pop, but stack can be totally implemented in the software by some memory access operations like mov, even the latter has better performance: Why use Push/Pop instead of Mov to put a number in a register in shellcode?

In ARM, push/pop are just aliases for memory operations, for reference: Push and Pop in arm

Why do we need to make the ISA aware of the existence of a stack? why don't we just make the hardware forget the "stack" and just leave it to software implementations? This would have two advantages, as far as I can see:

  1. hardware design can be simplified,
  2. give more flexibility to the software.

Can an ISA be implemented without the stack concept, i.e, without the push, pop, %rsp, %rbp, and such things?

TylerH
  • 20,799
  • 66
  • 75
  • 101
tristone
  • 95
  • 6
  • ARM has optional increment/decrement in the instruction so it can be simply aliased. Also note a lot of things can be implemented in software so by that logic why have multiple instructions (see [OISC](https://en.wikipedia.org/wiki/One-instruction_set_computer)). – Jester Dec 17 '22 at 13:26
  • `even the later has a better performance:` You're misreading that answer. This talks about comparing `push` followed by `pop` to a single `mov`, not whether implementing pushing and popping in software is generally faster than the provided instructions. – tkausl Dec 17 '22 at 13:28
  • 2
    This is basically asking "Why did the designers of different CPUs make different design decisions?" Answer: Because if they always made the same decisions, you wouldn't have different CPUs. Some CPUs have a dedicated stack register. Others (mostly RISC philosophy) do not. But it's not a pure CISC/RISC split. ARM64 is RISC-style but it has a dedicated stack register. – Raymond Chen Dec 17 '22 at 14:24
  • why questions like this have no answer and are not SO questions...please rewrite to make it a real question. – old_timer Dec 17 '22 at 17:49
  • I thought ARM *did* use the stack implicitly in some interrupt-handling cases? Or does it always just use banked registers to avoid doing memory access? Also, Thumb mode has `push` and `pop` instructions with implicit use of the stack pointer. MIPS does what you're suggesting; the stack is purely a software convention, not used implicitly by hardware ever. (Exceptions clobber a couple of the architectural registers, I think.) – Peter Cordes Dec 17 '22 at 20:17
  • 1
    1. yes, of course, many RISCs are like that if you just mean not having special instructions with an implicit register operand. 2. Yes, MIPS is like that, to my understanding. Not even exception handling uses the stack implicitly. – Peter Cordes Dec 19 '22 at 02:50
  • This might be more suitable on another side in the SE network, see https://stackexchange.com/sites#technology – mousetail Dec 19 '22 at 07:41
  • Maybe you can make your question more specific, such as, "What is the benefit of having a dedicated stack pointer register?". – xiver77 Dec 19 '22 at 17:18
  • @xiver77 I think you are right, let me create a new question. – tristone Dec 20 '22 at 02:42

1 Answers1

2

Why do we need to make the ISA aware of the existence of a stack? why don't we just make the hardware forget the "stack" and just leave it to software implementations?

It's not required for a CPU to work, that's for sure. There are indeed CPUs that do this, MIPS is one of them. While there is a register known as $sp, there is no hardware-enforced difference between it and the other general-purpose registers. MIPS doesn't have push or pop instructions that alter $sp implicitly. To push a register onto the stack in MIPS Assembly one would have to do it manually:

addiu $sp,$sp,-4
sw $s0,0($sp)

So why have the hardware be aware of the stack? One benefit seems to be that, on most architectures that do have a "hardware stack," the pop operation tends to be faster than loading the destination register with a value in memory- and this is the case even if you don't count the act of loading the source register with the desired address. Some assembly programmers would abuse this by temporarily relocating the stack to an array (whose values were stored in the reverse order, of course) and using pop to write each entry to a different register. Despite having some overhead with needing to disable interrupts and save the stack pointer in memory somewhere else, under very specific circumstances you could copy from memory just a little bit faster than normal, depending on how many values need to be read, what addressing modes the CPU allows, etc.

puppydrum64
  • 1,598
  • 2
  • 15
  • The CPUs you're talking about where `pop` is faster than a load are presumably old CPUs or microcontrollers. Presumably ones without a post-increment addressing mode for loads in general, unlike ARM (or x86 with `lodsw`). So you're saving a pointer-increment instruction, not that pop is faster than a pure load. (Except maybe with variable-length instructions to also save some code-fetch bandwidth, like on 8086.) – Peter Cordes Dec 21 '22 at 23:17
  • 1
    Or on modern CPUs, there's actually dedicated hardware to avoid a bottleneck on stack-pointer updates from multiple push/pop instructions, so x86 `pop` can actually be abused to read an array slightly faster than `lodsd`, if you're not going to unroll. It's almost always better just to unroll a loop to amortize a pointer increment, but here's a code-golf challenge that includes a performance component: [Extreme Fibonacci](https://codegolf.stackexchange.com/a/135618) - 105 bytes for the first 1000 decimal digits of Fibonacci(1 billion), where I used `pop` to loop arrays. – Peter Cordes Dec 21 '22 at 23:18
  • Yeah I was referring mostly to 8-bit micros, it's what I'm most familiar with. – puppydrum64 Dec 22 '22 at 11:11
  • On pipelined CPUs (like MIPS was designed to be from the start), having a `push` or `pop` is an obstacle to running multiple instructions per clock cycle because the stack pointer changes with every push/pop. So it creates dependency chains, and extra operations to write-back new values to it separate from the loads / stores. And pop has two register outputs. x86 since Pentium M pays the price for a [stack engine](https://stackoverflow.com/q/36631576/224132) to "rename" those updates to the stack pointer; other ISAs favour code-gen that doesn't do tons of push/pop back to back. – Peter Cordes Dec 22 '22 at 11:15