I am mainly thinking about Windows.
AFAIK on such platforms there are many stacks, each program, or maybe even each thread has its own stack, and each of such threads can push bytes onto it - AFAIK every of such push should be checked in runtime in case of stack overflow - so it seem it is some cost related to each and every push (something like arrays bounds checking) - how exactly this checking is implemented ?
On old machines as I remember there was no checking but some fff become 000 so there was no cost of checking, but today on windows platform it seem to me that probably every stack is bound checked - but I do not know how it is implemented.