This thread has a good list of times that it takes to access various parts of the computer architecture in a uniprocessor environment. How about in a dual processor environment, over Intel's QPI bus?
Let's assume a 64 byte packet memory is allocated on the first CPU. The second CPU has to access this via a 8.0 GT/s QPI bus, so I know the serialization latency alone is 4~ ns. What additional latency should I expect on the QPI bus?