4

Edit: The reason I am asking

I think I have to clarity that the platform I am using and the reason I am asking to make it easier for you to answer this question: I am using a x86_64 machine with gcc and Ubuntu and I am working on some interpreter of some toy language and I think tagged pointer is a neat trick can be used. I know Apple is using it. So I just want to try it out.


I was reading something about tagged pointer and I was wondering, how can I know how many free bits are there in a pointer on a particular machine.

For now my understanding is that if I am using a 64 bit machine, then when accessing memory the CPU will always access memory address that are multiple of 8 bytes. So it leaves 2 bits in the end of a pointer always set to 0. Also, if on a x86_64 machine, the first 14 bits will be always 0 right? Since they are never used by the CPU. malloc will make sure that the pointers it gives back will always be aligned. But how about other memory locations? Say variables on the stack?

How can I confirm this?

Somebody in the comments suggested that 2 bits I mentioned in the above is not right, indicating I am a bad programmer. I do not deny that I am not a very professional programmer, but I think I shall explain a bit a bout why I said 2 instead of 3.

I wrote a very simple program like this:

 #include <stdio.h>
 #include <stdlib.h>
 int main() {
    int a = 0;
    printf("%p\n", &a);
    int *p = malloc(sizeof(int));
    printf("%p\n", p);
 }

And I compiled it with gcc and run it over 10000 iteration on a 64 bit machine with Ubuntu. I discovered that &a always end up with last 4 bits as 1100 and p always end up with 0000, so I want to be conservative about how many bits are actually unused by the compiler. That's why I said 2 instead of 3.

Also if you can help me explain what I have been observed (&a ends up with 1100, which only has 2 bits set to 0), I would deeply appreciate it.

Community
  • 1
  • 1
Bob Fang
  • 6,963
  • 10
  • 39
  • 72
  • 2
    If you're using this information to try to use spare pointer bits for something it isn't meant for, I'd recommend against it. – Mysticial Dec 30 '14 at 21:00
  • 2
    There are no "free" bits. Some bits may always be zero, but that doesn't mean they're insignificant. Also, it's not true that "the CPU will always access memory address that are multiple of 8 bytes." First, the CPU has nothing to do with it; it's what the compiler does. And while the compiler may place class and struct members on 8-byte boundaries, that doesn't mean that any pointer can only address 8-byte blocks. For instance, obviously a char pointer needs to be able to access any byte in an 8-bit string. So all those bits are important. – Dan Korn Dec 30 '14 at 21:06
  • 4
    Also note that "first 14 bits are always zero" is certainly not true for x86. That would leave a mere 256k of address space. Something similar is true for x86_64 though. 12 bits are (for the foreseeable future) always zero, currently it's de-facto 16 bits. – Damon Dec 30 '14 at 21:07
  • @Damon yeah I made a mistake there, I meant x86_64. – Bob Fang Dec 30 '14 at 21:19
  • 1
    most CPUs can access any address, however, addresses that are not a multiple of the bus width take more CPU cycles. There are NO unused bits in an address. If your planning on using any of those'unused' bits, that will not work as setting any of those bits changes that address value – user3629249 Dec 30 '14 at 21:19
  • 3
    As others have warned, it will certainly not be portable, and if you make mistakes in the implementation you may be in for a world of hurt. But that doesn't mean it's not a viable technique for competent programmers to implement, for example, an interpreter or a database engine. But given that you wrote about "8 bytes" and "2 bits" I should perhaps advise against it... – Thomas Padron-McCarthy Dec 30 '14 at 21:23
  • @ThomasPadron-McCarthy I edited the question a bit to explain why I said 2 while I know it should be 3. – Bob Fang Dec 30 '14 at 21:35
  • @ThomasPadron-McCarthy also if you can help me explain what I have been observed (``&a`` ends up with 1100, which only have 2 bits set to 0), I would deeply appreciate it. – Bob Fang Dec 30 '14 at 21:39
  • Please explain why are you asking. – Basile Starynkevitch Dec 30 '14 at 21:39
  • @BasileStarynkevitch I am implementing some interpreter on my own and I just thought this trick is quite fun. Apple used it I think, and I want to try it out. – Bob Fang Dec 30 '14 at 21:40
  • @BasileStarynkevitch I edit the question to explain why I am asking this question. – Bob Fang Dec 30 '14 at 21:46
  • @dorafon: Sorry if I sounded like I was doubting your programming skills. But as to why 00 and 0000: Depending on the architecture of your machine, it is probably more efficient or even necessary for integers to be aligned on 4-byte boundaries, so the compiler makes them so. (x86 doesn't require alignment, but Sparc did. You'd get a Bus error.) malloc can be used for any type of data, so it's returned areas must fulfil the strictest alignments, and it seems to be 16-byte boundaries. Or, alternatively, it could just be an accident of the compiler and the malloc library, respectively. – Thomas Padron-McCarthy Dec 30 '14 at 21:48
  • like others said, if you really need to use the free bits then [the low free bits are safer](https://stackoverflow.com/a/18426582/995714) – phuclv Mar 08 '20 at 14:54

5 Answers5

3

Many recommend against using tag bits in the pointers (and so do I).

If you insist on doing that, use only the two (or perhaps three) lowest bits of pointers, and preferably write your own malloc-like allocator to ensure it.

On modern x86-64 processors most pointers (e.g. those to malloc-ed zones, or those to word-aligned data) are generally word-aligned, and that mean they are multiple of 8 (since a 64 bit word has 8 bytes).

Actually it is not only processor, but also ABI specific. Some ABIs mandate a 16 byte aligned stack pointer (to help SSE or AVX), others require only 8 byte alignment.

Don't expect the high bits of addresses to be fixed. Indeed they usually are, but this is processor specific (could be slightly different on a high-end Intel Xeon and on a low-end FS1b AMD processor, and could be different in the near-future processors).

BTW, this is OS (and processor) specific. And take ASLR & VDSO into account.

Look e.g. inside the source code of bigloo, file runtime/Include/bigloo.h for an example of tagging.

If implementing your own interpreter, a related issue is the garbage collector. I would suggest to use Boehm's conservative GC; it is probably not the fastest or the best, but it is good enough (and thread-friendly). By experience (e.g. in MELT), debugging a GC is time-consuming.

Also, today, the memory is much more important than the tag computation. Be aware of the CPU cache, ...

If on Linux, look into /proc/self/maps or /proc/$$/maps (see proc(5))

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
2

You cannot guarantee that on every 64 bit machine, compiler and C structure pointers aligned to 8 bytes. For instance, on Intel i86 hardware pointers don't have to be aligned at all.

If you are dealing with packed structures aligned to the nearest byte eg

struct { char foo; char *p; } __attribute__((packed)) bar_t

then pointers won't necessarily be aligned (and also probably won't work on word based architectures).

Tagged Pointers might be useful for specialised circumstances where one is writing embedded software on a fixed platform that one controls, but should be obvious from the article that it isn't portable.

It will be an accident waiting to happen when you inevitably try to reuse the code on a new platform some years later.

Dirk Koopman
  • 131
  • 5
2

It really depends on your platform (OS and such) and any reliable information about how memory allocations are handled by it.


If your program is built as x86 (32-bit address space) and you're running on a Windows OS and you don't have the Large Address Aware flag on, then you could start making assumptions about the highest bit being unused due to the legacy of how x86 programs are given memory space.

However I think that is more along the lines of playing tricks with implementation details than any kind of good practice.

After all, the whole reason the Large Address Aware flag exists is because some previous developers played these kinds of tricks back when Windows operating systems only happened to give the lower 2GB of the address space to the non-OS code. So when technology advanced such that providing more than 2GB was viable, Microsoft couldn't just open it up for all programs because there was no good way to know which would still work and which were doing funky things with pointers.

Consequently, they had to invent the Large Address Aware flag so that developers had a way to indicate their software could cope with higher address values (presumably with the awareness that it means they shouldn't play tricks using pointer bits for other purposes).


In my opinion there should usually be a better solution to whatever you're thinking of doing with those pointer bits, but it really depends on the details.

TheUndeadFish
  • 8,058
  • 1
  • 23
  • 17
0

I also recommend against this.

However, for the high bits there are many cases where it is more fundamental than the OS, since some processor does simply not use those bits.

E.g. the amd64-architecture currently only supports 48 bits: https://en.wikipedia.org/wiki/X86-64

And the original 68k from Motorola only used 24 of 32 bits - the other address lines were simply missing: https://en.wikipedia.org/wiki/Motorola_68000#Address_bus

People used that for tagged pointers, and got problems when later processors were added.

Hans Olsson
  • 11,123
  • 15
  • 38
0

Current x86-64 implementations support 48-bit virtual addresses. They require that virtual address bits [63:48] be copies of bit 47. Otherwise the address is non-canonical and will fault. (This design decision avoids future pain when creating implementations that support more virtual address bits, because code can't assume that hardware will ignore some bits of addresses, which IIRC has been an issue on previous ISAs.)

See also the canonical address section in Wikipedia's x86-64 article for a diagram.

So you can use the upper 16 bits of a pointer to store something else, but before using you need to sign-extend it from 48 to 64 bits to make it canonical. (e.g. left-shift by 16, then arithmetic right shift by 16).

This is pretty high overhead for pointers that will be dereferenced in many different places, so a separate tag is probably better. Atomic read-modify-write of separate pointer and tag is still possible with cmpxchg16b if they're adjacent. (Compilers will do this for you with compare_exchange_weak on std::atomic<two_member_struct>).

It's probably cheaper to use low bits of the address if you only need it to work for pointers that are aligned, so you can just clear them with an AND instead of two shifts. (The 64-bit AND mask can use the single byte sign-extended-imm8 encoding in machine code).


Things get a bit easier if you can assume that the upper 16 bits of your addresses are always zero, instead of needing to sign-extend. Then you can AND with a constant to zero the upper bits. But in the asm, 0x0000FFFFFFFFFFFF doesn't work as an immediate constant for an AND instruction. It would need to be put in a register with a 10-byte movabs (imm64) instruction, or used as a memory operand.

Linux normally uses addresses in the lower canonical range, but looking at less /proc/self/maps shows that the [vsyscall] page exported by the kernel is placed in the upper half. It's probably unlikely that malloc / mmap will return high addresses, but I wouldn't want to depend on it for correctness without a lot more research, and control of the conditions under which code making this assumption was used.


In the far future, when x86-64 implementations support more than 48-bit addresses, your code will have to run with a backwards-compat option that asks the OS to only ever give you memory in the upper or lower 47 bits. Presumably such an option will exist, because there is probably already some existing code which makes assumptions about canonical addresses.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847