I'm trying to make a sleep
-like function to be used in a simple kernel I made with C that runs on a virtual machine, using a loop where I'm trying to make each run of the loop as close to 1 nanosecond as possible, so for this reason, I decided to write it in assembly. My CPU is a Sandybridge and its clock rate is 2.7GHz, so 1 clock cycle is 10/27ns and 2.7 clock cycles are 1ns. I looked up the throughput of the instructions I needed in the Sandybridge section of Agner Fog's instruction table, here's a list of the ones I needed (showing latency too just in case):
Instruction | Operands | Latency | Reciprocal throughput |
---|---|---|---|
NOP | 0.25 | ||
ADD SUB | mem, reg/imm | 6 | 1 |
Cond. jump | short/near | 0 | 1-2 |
Since jcc
has a throughput of 1-2 clock cycles, I averaged it to 1.5. My function looks like this:
BITS 32
section .text
global sleep_for
; void sleep_for(unsigned n)
sleep_for:
nop ; 2.70 - 0.25 = +2.45
sub DWORD [esp+4], 1 ; 2.45 - 1.00 = +1.45
jnz short sleep_for ; 1.45 - 1.50 = -0.05
ret ; Each iteration = 2.75 (1.01851852ns)
The comments describe what I think should be happening, and the result seems to be close to 1ns at least. But when I try sleep_for(1000000000)
(or sleep for 1s) in my C program, it ends up waiting for about 4.35s instead. Why is the function waiting for so much longer than it should?