On my E8200 box this doesn't occur, but on my Atom N450 netbook (both running OpenSuse 11.2), whenever I read the CPU's TSC, the returned value is mod 10 == 0
, i. e. it is without remainder divisible by 10. I'm using the RDTSC value for measuring times that interesting pieces of code take, but for the purpose of demonstration I've made up this little program:
.text
.global _start
_start: xorl %ebx,%ebx
xorl %ecx,%ecx
xorl %r14d,%r14d
movb $10,%cl
loop: xchgq %rcx,%r15 # save to reg
cpuid
rdtsc
shlq $32,%rdx
xorq %rax,%rdx # full 64 bit of RDTSC
movq %r14,%r13 # save the old value
movq %rdx,%r14 # copy current
movq %r14,%rsi # argv[1] of printf()
subq %r13,%rdx # argv[2] (delta)
leaq format(%rip),%rdi # argv[0]
xorl %eax,%eax # no stack varargs
call printf
xchgq %rcx,%r15
loop loop
0: xorl %eax,%eax
movb $0x3c,%al
syscall
.size _start, .-_start
.data
format: .asciz "rdtsc: %#018llx = %1$llu -- delta: %llu\n"
(I usually use my own routines for converting, but to prevent readers from suggesting that the error might be there, I'm just using printf() here.)
With the above code, the output is (for example):
rdtsc: 0x000b88ef933ffd06 = 3246787292822790 -- delta: 3246787292822790
rdtsc: 0x000b88ef9342fcf4 = 3246787293019380 -- delta: 196590
rdtsc: 0x000b88ef93435dca = 3246787293044170 -- delta: 24790
rdtsc: 0x000b88ef9343b43c = 3246787293066300 -- delta: 22130
rdtsc: 0x000b88ef93440c34 = 3246787293088820 -- delta: 22520
rdtsc: 0x000b88ef9344604e = 3246787293110350 -- delta: 21530
rdtsc: 0x000b88ef9344b4d6 = 3246787293131990 -- delta: 21640
rdtsc: 0x000b88ef9345085a = 3246787293153370 -- delta: 21380
rdtsc: 0x000b88ef93455d96 = 3246787293175190 -- delta: 21820
rdtsc: 0x000b88ef9345b16a = 3246787293196650 -- delta: 21460
As can be easily seen, the delta varies in reasonable amounts. But conspicuous (not to say conspired ;-) is that the least significant decimal digit is always 0.
I've observed this phenomenon for more than two years now, and Stack Overflow is not the first address where I make this issue public. But nowhere I got a reasonable answer yet. The ideas we (me and other people out there) came up with, are that
- the TSC is incremented only every 10th cycle, but then by 10, or
- the TSC is internally updated correctly, but reflected to the outside only every 10th cycle, or
- the TSC is incremented by 10 each cycle.
None of these points really make sense, however. I should have actually run a program like that on the E8200 (which is currently out of order) to see if the order of magnitude of the deltas is the same or only a tenth of those in the above output. (Any volunteers?)
Googling didn't help, Intel's manuals did neither.
When discussing with other people, there was no-one else who experienced the same behaviour. If it had to do with the kernel, then at least 3 versions were affected, but then... what does the kernel have to do with it?
I've also had the netbook in service, and it came back with a new motherboard — implied a new CPU, so at least two individual entities of N450 must be affected.
I've also took measures against clock frequency changes (and no matter what frequency I fixed the clock to, the values varied only in the expected range (the same as shown)), and switched off HT, though these should actually help to get some other least significant digits, rather than preventing them. But just to be sure.
Well, if anyone wants to run the program on their machine, the command line is (provided you save the source in a file rdtsc.s
):
as rdtsc.s -o rdtsc.o
ld --dynamic-linker=/lib64/ld-linux-x86-64.so.2 rdtsc.o -L /lib64 -l c -o rdtsc
In order to build it with the gcc frontend, i. e.
gcc -l c rdtsc.s -o rdtsc
you must add (or replace the _start:
label with) a main:
label and make it global.
[update (2012-09-15 ~21:15 UTC): Actually I could also have done this before: I just let it take the TSC before and after a sleep(1)
, which gives a delta slightly greater than 1,666,000,000, which shows that the third point in the list above is wrong. But still I have no idea why I don't get the full precision. /update]