0

I'm using rdtsc instruction and i know that it stores the high quadword into RDX and the low quadword into RAX (RDX:RAX) but i want to do arithmetic with this (substraction of two timestamps) So i need to move RDX:RAX into a 128 bits register (xmm0)

movq instruction is working but only for low quadword (movq xmm0,rax): enter image description here

Is it possible to move RDX into the highest 64 bits part of xmm0 (the second part of v2_int64) ?

0xDEADBEEF
  • 66
  • 7
  • 2
    You do realize that you don't need to use xmm registers to do 128-bit subtraction. You can do it in regular registers with a sub followed by sbb. That'll probably be faster than moving into and out of xmm registers. – Raymond Chen Mar 06 '22 at 19:58
  • Thank you for your answer, but i'm new at assembly programming and i don't know how to use this. Could you give me an example ? – 0xDEADBEEF Mar 06 '22 at 20:00
  • 3
    Do not post pictures of text! – fuz Mar 06 '22 at 20:25
  • 1
    You can use `pinsrq` if you have SSE 4.1. If not, use `movq` and then `punpcklqdq`. – fuz Mar 06 '22 at 20:26
  • 1
    Will you ever need anything more than `movq xmm0, rax`? A 64-bit number is a very, very big number. When is the `RDX` register ever going to be anything else than 0? – Sep Roland Mar 06 '22 at 20:41
  • 4
    Apparently `rdtsc` still returns in `edx:eax` in 64-bit mode, so the higher half of `rax` will always be zeroed. Reference: https://stackoverflow.com/questions/17401914/why-should-i-use-rdtsc-differently-on-x86-and-x86-x64 **ETA:** Better source: https://www.felixcloutier.com/x86/rdtsc "(On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are cleared.)" – ecm Mar 06 '22 at 21:14
  • 2
    Having a 128-bit number in an XMM register isn't even going to help you subtract it. The widest SIMD element size for arithmetic is 64-bit, like `psubq`, and unlike scalar it doesn't produce carry/borrow output. – Peter Cordes Mar 06 '22 at 21:57
  • Possibly related: [RDTSCP in NASM always returns the same value (timing a single instruction)](https://stackoverflow.com/q/54621381) . Also, for short intervals (under 1 second on a 4.3GHz CPU or slower), you can just discard the high bits of the TSC. – Peter Cordes Mar 06 '22 at 22:30
  • [How can I count how many clock cycles it takes for the rdtsc instruction to execute?](https://stackoverflow.com/q/13262203) shows how to use sub/sbb to subtract timestamps, although merging into a 64-bit integer reg makes sense in 64-bit mode if you want the full thing. Saving two halves separately ties up more regs and takes two mov instructions, SHL/LEA can get the merging done also in 2 fairly efficient instructions. – Peter Cordes Mar 06 '22 at 22:34
  • thanks a lot for all of your feedbacks, it is so helpful ! – 0xDEADBEEF Mar 06 '22 at 22:36

0 Answers0