1

My code is very simple:

global start
extern printf, Sleep

section .data:
    string db 'End: %llu', 0Ah, 'Start: %llu', 0

section .text:
start:
    mfence
    lfence
    rdtsc
    sub rsp, 8
    mov [rsp + 4], eax
    mov [rsp], edx
    mov rcx, 5000 ; Sleep for 5 seconds
    sub rsp, 32
    call Sleep
    add rsp, 32
    rdtscp
    pop rbx
    shl rdx, 32
    mov edx, eax
    ;RDX is end value
    ;RBX is start value
    mov r8, rbx
    mov rcx, string
    sub rsp, 40
    call printf
    add rsp, 40
    xor rax, rax
    ret

I am using the RDTSC instruction to time a piece of code (in this case the WinAPI Sleep() function because it makes things clearer), and the mfence + lfence pair for serialization. I ran the program 3 times and I got this output:

//1
End: 3717167211
Start: 12440347256463305328

//2
End: 2175818097
Start: 5820054112011561610

//3
End: 4070965503
Start: 13954488533004593819

From what I understand, RDTSC should always output increasing results, so I don't get why in test 2 the end is smaller than the start value.

Anyway, my goal is to output the amount of seconds the function actually took to execute. I would guess that I need to take the difference between the end and start values and then divide the result by the CPU frequency, but I don't know how to do that. Can anybody help?

P.S. I don't need answers about external libraries such as the CRT clock() as I already know how to use such functions. My point here is to learn how to use the RDTSC instruction.

DarkAtom
  • 2,589
  • 1
  • 11
  • 27
  • The code only uses 32 bits of the 64-bit value it generates into edx:eax. So overflow is quite likely when the start value is already close to 4 billion. The actual value is not interesting, only the difference. Use SUB. – Hans Passant Mar 10 '20 at 20:55
  • @HansPassant Wait, I thought that in x64 `RDTSC` outputs in 64-bit RAX, not EDX:EAX. Still, what should I do with the difference? – DarkAtom Mar 10 '20 at 20:56
  • 2
    RAX only has the low 32 bits of the 64-bit TSC (yes, even in 64-bit mode, unfortunately). Read the manual: https://www.felixcloutier.com/x86/rdtsc. But fortunately, the 32-bit difference still works and ignoring the high bits is fine as long as the difference is smaller enough that you don't have aliasing (missing a whole multiple of 2^32). – Peter Cordes Mar 10 '20 at 20:57
  • Also, no, you don't divide by the CPU frequency, you divide by the *reference* frequency. This is not always exactly the same as the CPU base frequency, but e.g. on my i7-6700k it's 4008MHz while the base frequency is 4.0GHz. And its definitely not the same as the CPU's current core clock frequency. See also [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/a/51907627) for more details. Also, you fenced wrong: you need `lfence; rdtsc` at the end of your timed region, or rdtscp. (Following either of those with lfence can be useful) – Peter Cordes Mar 10 '20 at 21:00
  • I have edited the code to include the EDX:EAX and the RDTSCP but now the start value is always bigger :( – DarkAtom Mar 10 '20 at 21:09
  • 1
    re: your edit: x86 is little-endian but you stored the low half (EAX) at a higher address (RSP+4). That's backwards. Also, `mov edx, eax` zero-extends EAX into RDX; use 64-bit `or` not 32-bit `mov`. [Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?](https://stackoverflow.com/q/11177137) That's why your end value is just the low half truncated. Print in hex and it'll be easier to catch these mistakes. – Peter Cordes Mar 10 '20 at 21:11
  • Thank you! It now works. I didn't know that if you `mov edx, eax`, it fills the higher 32 bits of rdx with 0. Good to know! – DarkAtom Mar 10 '20 at 21:15
  • See [clflush to invalidate cache line via C function](https://stackoverflow.com/q/51818655) for an example of fencing RDTSC for better repeatability of measurements. The other 3 duplicates I linked this to explain what's wrong with your current code for combining RDTSC results, and about RDTSC in general. – Peter Cordes Mar 10 '20 at 21:25

0 Answers0