In user-space code for mainstream x86-64 OSes, you can normally assume that the upper bits of any valid address are zero.
AFAIK, all the mainstream x86-64 OSes use a high-half kernel design where user-space addresses are always in the lower canonical range.
If you wanted this code to work in kernel code, too, you would want to sign-extend with x <<= 16; x >>= 16;
using signed int64_t x
.
If the compiler can't keep 0x0000FFFFFFFFFFFF = (1ULL<<48)-1
around in a register across multiple uses, 2 shifts might be more efficient anyway. (mov r64, imm64
to create that wide constant is a 10-byte instruction that can sometimes be slow to decode or fetch from the uop cache.) But if you're compiling with -march=haswell
or newer, then BMI1 is available so the compiler can do mov eax, 48
/ bzhi rsi, rdi, rax
. Either way, though, one AND or BZHI is only 1 cycle of critical path latency for the pointer vs. 2 for 2 shifts. Unfortunately BZHI isn't available with an immediate operand. (x86 bitfield instructions mostly suck compared to ARM or PowerPC.)
Your current method of extracting bits [55:48]
and using them to replace the current bits [63:56]
is probably slower because the compiler has to mask out the old high byte and then OR in the new high byte. That's already at least 2 cycle latency so you might as well just shift, or mask which can be faster.
x86 has crap bitfield instructions so that was never a good plan. Unfortunately ISO C++ doesn't provide any guaranteed arithmetic right shift, but on all actual x86-64 compilers, >>
on a signed integer is a 2's complement arithmetic shift. If you want to be really careful about avoiding UB, do the left shift on an unsigned type to avoid signed integer overflow.
int64_t
is guaranteed to be a 2's complement type with no padding if it exists.
I think int64_t
is actually a better choice than intptr_t
, because if you have 32-bit pointers, e.g. the Linux x32 ABI (32-bit pointers in x86-64 long mode), your code might still Just Work, and casting a uint64_t
to a pointer type will simply discard the upper bits. So it doesn't matter what you did to them, and zero-extension first will hopefully optimize away.
So your uint64_t
member would just end up storing a pointer in the low 32 and your tag bits in the high 32, somewhat inefficiently but still working. Maybe check sizeof(void*)
in a template to select an implementation?
Future proofing
x86-64 CPUs with 5-level page tables for 57-bit canonical addresses are probably coming at some point soonish, to allow use of large memory mapped non-volatile storage like Optane / 3DXPoint NVDIMMs.
Intel has already published a proposal for a PML5 extension https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf (see https://en.wikipedia.org/wiki/Intel_5-level_paging for a summary). There's already support for it in the Linux kernel so it's ready for the appearance of actual HW.
(I can't find out if it's expected in Ice Lake or not.)
See also Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)? for more about where the 48-bit virtual address limit comes from.
So you can still use the high 7 bits for tagged pointers and maintain compat with PML5.
If you assume user-space, then you can use the top 8 bits and zero-extend, because you're assuming the 57th bit (bit 56) = 0.
Redoing sign- (or zero-) extension of the low bits was already optimal, we're just changing it to a different width that only re-extends the bits we disturb. And we're disturbing few enough high bits that it should be future proof even on systems that enable PML5 mode and use wide virtual addresses.
On a system with 48-bit virtual addresses, broadcasting bit 57 to the upper 7 still works, because bit 57 = bit 48. And if you don't disturb those lower bits, they don't need to be re-written.
And BTW, your GetUID()
returns an integer. It's not clear why you need that to return the static address.
And BTW, it may be cheaper for it to return &uid
(just a RIP-relative LEA) than to load + re-canonicalize your m
member value. Move static const uint64_t uid = 0xBEA57;
to a static member variable instead of being within one member function.