Recently I was reading about the small string optimization (SSO): What are the mechanics of short string optimization in libc++?. As we know, a string typically consists of 3 pointers, which is 24 bytes on a 64 bit system. The linked answer says that in libc++'s implementation, the very first bit of the first pointer is used to indicate whether the string is in "long" or "short" mode, i.e. heap allocation and external storage vs internal storage of up to some 22 characters.
This however assumes however that the first bit of the first pointer cannot ever meaningfully be part of the address, because whenever the string is in "long" mode, that bit will always be set (or unset, depending which convention was chosen). This seems reasonable on its face, since with 64 bit pointers that allows 2^64 addresses, larger than 1 followed by 18 zeroes in bytes, or more than 1 billion gigabytes.
So this is reasonable, though not certain. My question is: is this guaranteed somewhere? And if it is guaranteed, where is it guaranteed? By the architecture spec, or by something else? To take it a step further: how many bits is it safe to do this with? I have a vague recollection reading somewhere that only 48 bits are used, but I don't recall.
If there are some number of bits, e.g. 8 or 16 that are guaranteed to be untouched, that is certainly something that could be leveraged in some interesting ways. It would be nice to exploit this, but not at the cost of having code mysteriously failure on some machine.