Relation between numeric representation of memory address and alignment?

Question

Example:

std::ptrdiff_t dist(void* a, void* b)
{
    return static_cast<std::uint8_t*>(b) - static_cast<std::uint8_t*>(a);
}

Align8Type align8; // alignof(Align8Type) == 8
std::uintptr_t(&align8) & 3; // [1]
dist(nullptr, &align8) & 3; // [2]
Align8Type* p = reinterpret_cast<Align8Type*>(static_cast<std::uint8_t*>(nullptr) + dist(nullptr, &align8));
assert(&align8 == p); // [3]

Assuming std::uint8_t is supported, are the results of [1] & [2] guaranteed to be 0 and is [3] guaranteed to be true in c++ standard? If not, what about in practice?

You seem to assume a byte addressed machine, which is not guaranteed by the standard. — Bo Persson, Jan 12 '16 at 07:32
The fact that such machines exist. See for example [Exotic architectures the standard committees care about](http://stackoverflow.com/questions/6971886/exotic-architectures-the-standards-committees-care-about) — Bo Persson, Jan 12 '16 at 07:46
The standard doesn't even guarantee the existence of `uint8_t` — MikeMB, Jan 12 '16 at 07:47
`uint8_t` must exist on an 8-bit-addressed machine. Clearly this question only pertains to such machines — M.M, Jan 12 '16 at 08:41
Doesn't the C++ memory model define a byte as `char` independently of the underlying machine's representation, @BoPersson? Also, a byte is not necessarily an octet, if that is what you wanted to express. — Ulrich Eckhardt, Jan 12 '16 at 14:10
@Ulrich - Yes, the language standard defines a byte as the space a `char` occupies. It doesn't require that each char resides at a distinct memory address. For example, the old Univac I refer to in my link stored four 9-bit characters in each 36-bit word. As a memory address was much shorter than 36 bits, a char-pointer could store info about "part-word access" to the individual characters. And everything else was 36-bit aligned by default. On the other hand, it wouldn't have the type `uint8_t` defined so the code above wouldn't work anyway. — Bo Persson, Jan 12 '16 at 15:37

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

The standard makes no guarantees about the representation of a pointer [Note 1]. It is not necessarily the case that the values of a pointer map directly into consecutive integers, nor that pointers to types with different alignments have the same representation. So any of the following are possible:

Segment/offset representation where the segment number occupies the low-order bits of the pointer representation.
Pre-aligned representation, where the low-order 0s of the address of an object with known alignment are deleted from the representation.
Flagged representation, where the low-order bit(s) of pointers to certain object types are used to identify an aspect of the type, and do not participate in address resolution. (An example of this would be a hardware-assisted garbage-collection architecture in which the low order bits of pointers to types large enough to be pointers are repurposed as GC flags.)
Subword addressing representations, where the underlying hardware is word-addressed (and a word is considerably longer than 8 bits), but a hardware or software solution is available for byte addressing where a byte pointer consists of a pair of word address / subword offset. In this case, a byte pointer will be larger than a word pointer, which is allowed by the standard.

I'm sure there are other possibilities.

An alignment must be a power of 2, but there is no guarantee that more than one alignment exist. It is entirely possible for all types to have alignment 1. So it may well be on a given architecture that it is impossible to meaningfully define Align8Type.

Given all the above, my interpretation:

std::uintptr_t(&align8) & 3 == 0

False. Even if Align8Type is definable, there is no guarantee that the conversion of Align8Type* to std::uintptr_t is to a number divisible by 8. On a 32-bit word addressed machine, for example, the underlying hardware address mod 8 could be 0, 2, 4 or 6.
dist(nullptr, &align8) & 3 == 0

False. The subtraction of nullptr from a pointer to an object is Undefined Behaviour. (§5.7/5: "Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined.")
reinterpret_cast<Align8Type*>(static_cast<std::uint8_t*>(nullptr) + dist(nullptr, &align8)) == &align8

False. First, as per 2., the invocation of dist is Undefined Behaviour. Second, adding that value to a null pointer is Undefined Behaviour.

Round-trip conversion of T1* to T2* and back to T1* is guaranteed provided that the alignment requirements of T2 are less strict than T1 (§5.2.10/7). In this case, T1 is Align8Type and T2 is uint8_t, and the alignment restriction presumably holds, so if it were not for the undefined behaviour of the arithmetic, this would work. That is, you could cast &align8 to uint8_t* and then cast it back to Align8Type. You could even add the integer 0 to the intermediate uint8_t* pointer, but no other integer.

Do these identities work in practice? They probably work on C++ implementations on 8-bit byte-addressed 2's complement machines, which are pretty common (a lot more common than the theoretical beasts mentioned above, which are, statistically speaking, as common as unicorns). But technically, they render your code non-portable. I have no idea what aggressive optimizations might do to the UB mentioned in points 2 and 3, so I wouldn't suggest risking it in production code.

Notes:

§3.9.2/3:

The value representation of pointer types is implementation-defined.

§5.2.10/4:

A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is implementation-defined. [ Note: It is intended to be unsurprising to those who know the addressing structure of the underlying machine. —end note ]

I reproduced the note, because it is interesting: in order to understand the representation of an address as an integer, you must understand the underlying machine's addressing structure (which, by implication, might not be as simple as a contiguous sequence of integers).

Most of us seem to assume the numeric representation of a pointer directly maps to the consecutive address, even in Boost.Align! — Jamboree, Jan 13 '16 at 15:57
@Jamboree: And it does, on all modern architectures that I know of. But the standard doesn't require it to. (Many low-level libraries also assume that you can freely subtract pointers which do not point into the same array, but the standard is quite clear that you cannot. It's valid for a library to do that if it has some preprocessing protection which limits its applicability to architectures in which the assumption is known to be correct.) — rici, Jan 13 '16 at 16:13

Adrian Maire · Answer 2 · 2016-01-12T09:16:40.773

0

In the C++ standard,

Objects declared as characters (char) shall be large enough to store any member of the implementation’s basic character set.

The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation defined.

Every byte has a unique address.

A uint_8 is not necessarily a byte. And a byte is not necessarily 8 bits

Are the results of [1] & [2] guaranteed to be 0?

Supposing Align8Type to have an address 8 byte aligned:

[1] Yes: by definition of the previous supposition.

[2] Yes, Even if the byte size could be bigger than uint_8,supposing Align8Type has an address 8 byte aligned, the address will be multiple of 8. (uint_8 is smaller or equal to a byte)

Is [3] guaranteed to be true in c++ standard?

No: dist return the uint_8 distance between both pointers, not the address distance.

EDITED:

edited to answer the redefined question.

edited Jan 12 '16 at 09:16

answered Jan 12 '16 at 07:46

Adrian Maire

14,354
9
45
85

I wasn't very clear in the question, I added some prerequisite and used `std::uint8_t` to avoid the `char` problem. Note that in [3] it's `std::uint8_t* + dist` and then cast back to `Align8Type*`. – Jamboree Jan 12 '16 at 08:08
1

1 char = 1 byte, this is defined by the standard – M.M Jan 12 '16 at 08:43
@M.M Do you have any reference to that assertion? – Adrian Maire Jan 12 '16 at 09:09
@Adrian Maire Yes, it's defined in the standard. – Zimano Jan 12 '16 at 09:23
Ok, I found it: 5.3.3. I do not know why is not specified in the char specification but in the sizeof operator. – Adrian Maire Jan 12 '16 at 09:29
All I can find is that since C++14 `char` must implicitly convertible to and from a type with 256 values and that `unsigned char` can be used to inspect objects at the raw-memory level. – PeterT Jan 12 '16 at 09:32
the standard `byte` does not necessarily need to be 8-bits. Although since `sizeof()` and all other pointer operations are defined based on that standard `byte` it doesn't really matter for this question. – PeterT Jan 12 '16 at 09:39
@PeterT: the question was slightly different in the first version. That the why of this char vs bytes. – Adrian Maire Jan 12 '16 at 09:41
@AdrianMaire I remember the standard doesn't define the mapping between a pointer and its numeric representation. Any source to back up your answer? – Jamboree Jan 12 '16 at 09:51
@Jamboree: "The memory available to a C++ program consists of one or more sequences of contiguous bytes." I understand contiguous as address contiguous, not as hardware contiguous (which do not mean anything). – Adrian Maire Jan 12 '16 at 10:12

Relation between numeric representation of memory address and alignment?

2 Answers2

Notes: