asm x86_64 Intel Linux - Move RDX:RAX into XMM0

Question

I'm using rdtsc instruction and i know that it stores the high quadword into RDX and the low quadword into RAX (RDX:RAX) but i want to do arithmetic with this (substraction of two timestamps) So i need to move RDX:RAX into a 128 bits register (xmm0)

movq instruction is working but only for low quadword (movq xmm0,rax):

Is it possible to move RDX into the highest 64 bits part of xmm0 (the second part of v2_int64) ?

You do realize that you don't need to use xmm registers to do 128-bit subtraction. You can do it in regular registers with a sub followed by sbb. That'll probably be faster than moving into and out of xmm registers. — Raymond Chen, Mar 06 '22 at 19:58
Thank you for your answer, but i'm new at assembly programming and i don't know how to use this. Could you give me an example ? — 0xDEADBEEF, Mar 06 '22 at 20:00
You can use `pinsrq` if you have SSE 4.1. If not, use `movq` and then `punpcklqdq`. — fuz, Mar 06 '22 at 20:26
Will you ever need anything more than `movq xmm0, rax`? A 64-bit number is a very, very big number. When is the `RDX` register ever going to be anything else than 0? — Sep Roland, Mar 06 '22 at 20:41
Apparently `rdtsc` still returns in `edx:eax` in 64-bit mode, so the higher half of `rax` will always be zeroed. Reference: https://stackoverflow.com/questions/17401914/why-should-i-use-rdtsc-differently-on-x86-and-x86-x64 **ETA:** Better source: https://www.felixcloutier.com/x86/rdtsc "(On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are cleared.)" — ecm, Mar 06 '22 at 21:14
Having a 128-bit number in an XMM register isn't even going to help you subtract it. The widest SIMD element size for arithmetic is 64-bit, like `psubq`, and unlike scalar it doesn't produce carry/borrow output. — Peter Cordes, Mar 06 '22 at 21:57
Possibly related: [RDTSCP in NASM always returns the same value (timing a single instruction)](https://stackoverflow.com/q/54621381) . Also, for short intervals (under 1 second on a 4.3GHz CPU or slower), you can just discard the high bits of the TSC. — Peter Cordes, Mar 06 '22 at 22:30
[How can I count how many clock cycles it takes for the rdtsc instruction to execute?](https://stackoverflow.com/q/13262203) shows how to use sub/sbb to subtract timestamps, although merging into a 64-bit integer reg makes sense in 64-bit mode if you want the full thing. Saving two halves separately ties up more regs and takes two mov instructions, SHL/LEA can get the merging done also in 2 fairly efficient instructions. — Peter Cordes, Mar 06 '22 at 22:34

asm x86_64 Intel Linux - Move RDX:RAX into XMM0

0 Answers0