Suppose you have mov rax, [rsi]
from debugging.
So, how is this instruction actually executed?
Can the address pointed to by the rsi
register be the L1 cache in the best case?
Or, when the address pointed to by rsi
is translated, read data from disk or network rather than from main memory?
So what is the worst case if we consider the following conditions when executing the above instruction?
Latency Comparison Numbers
L1 cache reference
Branch mispredict
L2 cache reference
Mutex lock/unlock
Main memory reference
Compress 1K bytes with Zippy
Send 1K bytes over 1 Gbps network
Read 4K randomly from SSD*
Read 1 MB sequentially from memory
Round trip within same datacenter
Read 1 MB sequentially from SSD*
Disk seek
Read 1 MB sequentially from disk
Send packet CA->Netherlands->CA