Data layouts used by C compilers (the alignment concept)

Question

Below is an excerpt from the red dragon book.

Example 7.3. Figure 7.9 is a simplification of the data layout used by C compilers for two machines that we call Machine 1 and Machine 2.

Machine 1 : The memory of Machine 1 is organized into bytes consisting of 8 bits each. Even though every byte has an address, the instruction set favors short integers being positioned at bytes whose addresses are even, and integers being positioned at addresses that are divisible by 4. The compiler places short integers at even addresses, even if it has to skip a byte as padding in the process. Thus, four bytes, consisting of 32 bits, may be allocated for a character followed by a short integer.

Machine 2: each word consists of 64 bits, and 24 bits are allowed for the address of a word. There are 64 possibilities for the individual bits inside a word, so 6 additional bits are needed to distinguish between them. By design, a pointer to a character on Machine 2 takes 30 bits — 24 to find the word and 6 for the position of the character inside the word. The strong word orientation of the instruction set of Machine 2 has led the compiler to allocate a complete word at a time, even when fewer bits would suffice to represent all possible values of that type; e.g., only 8 bits are needed to represent a character. Hence, under alignment, Fig. 7.9 shows 64 bits for each type. Within each word, the bits for each basic type are in specified positions. Two words consisting of 128 bits would be allocated for a character followed by a short integer, with the character using only 8 of the bits in the first word and the short integer using only 24 of the bits in the second word. □

I found about the concept of alignment here ,here and here. What I could understand from them is as follows: In word addressable CPUs (where size is more than a byte), there certain paddings are introduced in the data objects, such that CPU can efficiently retrieve data from the memory with minimum no. of memory cycles.

Now the Machine 1 here is actually a byte address one. And the conditions in the Machine 1 specification are probably more difficult than a simple word addressable machine having word size of say 4 bytes. In such a 64 bit machine, we need to make sure that our data items are just word aligned ,no more difficulty. But how to find the alignment in systems like Machine 1 (as given in the table above) where the simple concept of word alignment does not work, because it is byte addressable and has much more difficult specifications.

Moreover I find it quite weird that in the row for double the size of the type is more than what is given in the alignment field. Shouldn't alignment(in bits) ≥ size (in bits) ? Because alignment refers to the memory actually allocated for the data object (?).

"each word consists of 64 bits, and 24 bits are allowed for the address of a word. There are 64 possibilities for the individual bits inside a word, so 6 additional bits are needed to distinguish between them. By design, a pointer to a character on Machine 2 takes 30 bits — 24 to find the word and 6 for the position of the character inside the word." - Moreover how should this statement about the concept of the pointers, based on alignment is to be visualized (2^6 = 64, it is fine but how is this 6 bits correlating with the alignment concept)

Antti Haapala -- Слава Україні · Accepted Answer · 2021-03-31T18:32:31.570

First of all, the machine 1 is not special at all - it is exactly like a x86-32 or 32-bit ARM.

Moreover I find it quite weird that in the row for double the size of the type is more than what is given in the alignment field. Shouldn't alignment(in bits) ≥ size (in bits) ? Because alignment refers to the memory actually allocated for the data object (?).

No, this isn't true. Alignment means that the address of the lowest addressable byte in the object must be divisible by the given number of bytes.

Additionally, with C, it is also true that within arrays sizeof (ElementType) will need to be greater than or equal to the alignment of each member and sizeof (ElementType) be divisible by alignment, thus the footnote a. Therefore on the latter computer:

 struct { char a, b; }

might have sizeof 16 because the characters are in distinct addressable words, whereas

struct { char a[2];  }

could be squeezed into 8 bytes.

how should this statement about the concept of the pointers, based on alignment is to be visualized (2^6 = 64, it is fine but how is this 6 bits correlating with the alignment concept)

As for the character pointers, the 6 bits is bogus. 3 bits are needed to choose one of the 8 bytes within the 8-byte words, so this is an error in the book. An ordinary byte would select just a word with 24 bits, and a character (a byte) pointer would select the word with 24 bits, and one of the 8-bit bytes inside the word with 3 bits.

The other interesting fact here is that C allows pointers to different types to have different formats, and the "Machine 2" implementation takes advantage of that fact. This all works because it is not guaranteed that the conversion of a pointer to an integer type results in a value measured in bytes; in other words, you cannot portably test whether a pointer is "correctly aligned" by simply converting it to `intptr_t` and checking the remainder with the type's alignment. (That test should work if you first convert to `void*`, but I don't think even that is actually guaranteed.) — rici, Mar 31 '21 at 16:40

Data layouts used by C compilers (the alignment concept)

1 Answers1