Do padding bits need to be preserved?

Question

The MSP430X architecture is an extension of the 16 bit MSP430 architecture to a 20 bit address space. This is done by expanding the processor's registers to 20 bit, keeping the least addressable unit at one octet (CHAR_BIT equals 8).

On this architecture, one could think of an implementation of the C programming language that provides a 20 bit integer type for int, using an 8 bit char, a 16 bit short and an emulated 32 bit long. Since 20 is not a multiple of CHAR_BIT, some padding bits are required when storing a variable of type int. For instance, one could store an int in four bytes, leaving one byte and four bits of another byte as padding.

After reading what the standard says about padding bits in integer types, I'm unsure of how they are supposed to behave. Since in this case the padding only exists for storage, their value can neither be set nor observed except by type punning. And even then, copying an object of this 20 bit type does not copy any padding bits. Is such a kind of padding bits allowed by ISO 9899:2011?

I would say that it would be legal, but highly impractical since a lot of code has been written assuming that int:s are 16 (or 32) bits. An alternative approach would be to add an additional 20 bit integer type, as an extension to the standard, like `__int20_t`. Standard types like `size_t` and `intptr_t` could be typedef:ed to this new type, when pointers are configured to be 20 bits. — Lindydancer, Jul 18 '15 at 22:20
@Lindydancer: As you write "a lot of code ... assuming ...". Well, I think even more code has been written to use `int` (sic!) instead of `size_t`, or `long` instead of `(u)intptr_t` (on the MSP430, one should prefer unsigned types, btw.). And the compiler is free to use 20 bit instructions for `(u)intptr_t`, as the standard does not even define arithmetic operations for this type. — too honest for this site, Jul 19 '15 at 01:48
@Olaf The standard says that `uintptr_t` is a unsigned integer type; all arithmetic has to work on `uintptr_t` has to work just as on every other unsigned integer type. `intptr_t` behaves in an analogue fashion. — fuz, Jul 19 '15 at 01:51
Ok, so there would still be a distinct 20 bit type required. Hmm.... ok, just read my last comment to MattMcNabb on my answer. I do remember now, why I like my ARMs so much:-) — too honest for this site, Jul 19 '15 at 02:07
@Olaf Not necessarily. `uintptr_t` is defined as a type such that a pointer survives a round-trip through that type. It doesn't have to have the same size as a pointer type. — fuz, Jul 19 '15 at 02:20
@FUZxxl: Yes, but I thought about the compiler using 20 bit arithmetiic for objects of this type, as long arithmetic is too costly. Point is, this type mostly makes sense if you have to manipulate a pointer value - deep down the software stack or at OS-level. Just for storage/passing, you can use `void *` as well. I stand to my recommendation not to use such architectures if you really need the extended addressing range, but use a Cortex-M class CPU. This avoids all these problems and you can use a standard compiler. — too honest for this site, Jul 19 '15 at 02:45
@Olaf BTW, the gcc people have [implemented](http://people.redhat.com/~dj/msp430/size-optimizations.html) such a 20 bit type for `size_t` and possible uintptr_t on MSP 430X targets to reduce code size and improve speed. — fuz, Aug 07 '15 at 12:55
@FUZxxl: "The second big change is some ongoing work ...". We will see. I do not use experimental code for projects, but good to know that someday ... (it took already years for gcc mainline to support the MSP430 at all). Still not much use in the larger MSP430s. Even TI seems to be moving away from that CPU: MSP432 is Cortex-M4(!) based. Abwarten und Tee trinken. — too honest for this site, Aug 07 '15 at 13:03
@Olaf I just checked, these changes are part of the current [gcc toolchain](http://www.ti.com/tool/msp430-gcc-opensource) available on the TI site. — fuz, Aug 07 '15 at 13:11

score 3 · Accepted Answer · answered Jul 18 '15 at 20:32

The C standard does not require padding bits to be copied by assignment. Assignment is specified in terms of values, not representations.

N1570 6.2.6.2p5 says:

The values of any padding bits are unspecified.

That's an unqualified statement, implying that they're unspecified in all circumstances, even after an assignment from an object that has some padding bits set.

By itself, that statement might be considered vague enough that it doesn't firmly establish that padding bits aren't necessarily copied.

Padding bits do not contribute to the representation of an integer object. A footnote on the quoted sentence says:

All other combinations of padding bits are alternative object representations of the value specified by the value bits.

(The "other" refers to trap representations.)

6.5.16.1p2, describing simple assignment, says:

In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.

The description is in terms of values not representations; there is no implication that the representation of the RHS must be maintained in the LHS object. And of course the RHS in an assignment can be an arbitrary expression, not just an object reference. Even if it is just the name of an object, it undergoes lvalue conversion, described in 6.3.2.1p2; this conversion refers only to the value of the object, not to its representation.

(Elsewhere, the standard says that parameter passing, function argument passing, and returning a value from a function behave like simple assignment.)

It is required that following any assignment, padding bits have a value which would allow the value to be read back. An implementation could legitimately specify e.g. that that one of the padding-bit values must be the "xor" of two or more value bits, and attempts to read an integer where that isn't true may have arbitrary consequences; an implementation that sometimes validates padding bits on reading is required to always set them to correct values on writing. — supercat, Jul 20 '15 at 23:45

too honest for this site · Answer 2 · 2015-07-19T01:58:28.940

1

In general the standard places some constraints on the sizeof a type. Basic constraint is it has to be a multiple of char whith sizeof(char) defined as 1.

For padding bits within a type, refer to 6.2.6.1, which leaves the representation mostly implementation defined. 6.2.6.2p5 states that the value of padding bits is unspecified; there is no need to preserve, but there are two important constraints on the padding bits:

A positive value in a signed integer shall represent the same value of the same unsigned type. This guarantees compatibility between signed and unsigned variants of the same type for positive values within the range of the signed variant.
If all bits are zero, this represents the value 0. So all padding bits have to be 0, too. However, the reverse is not true (thanks to MattMcNabb).

Both include padding bits as they are part of the internal representation. From a more practical view, padding bits should be set to zero unless there are parity, etc. bits which depend on the other bits (yet the 2nd constraint has to be met).

That is a rough interpretation. For details, refer to the rest of cited sections.

On MSP430X, 20 bit int is of little practical use. They are mostly meant to extend the addressing range, not for integer arithmetics (although the instruction set apparently supports it - I was wrong here in a former edit).

Pointers have a sizeof 32 bits (4 8-bit-bytes), but only use 20 bits. Some embedded compilers might support special short/near/... qualifiers, effectively providing two different pointer sizes. This is - however - actually against the standard. (I'm a bit ambivalent here: optimization or portability).

MSP430X is one of the platforms where using the dedicated types from stdint.h (uintptr_t) and stddef.h (e.g. size_t) is essential, as casting a pointer to/from int will eventually fail. Even more, the standard's only requirements for (u)intptr_t (temporary storage, no operations) becomes clear. This way, there is no guarantee anything about the padding bits - even for the null pointer.

Reason for this large overhead (37.5% unused bits) is that the MSP430X has no functions to read/write 20 bit or even 24 bit values (and it would make array-indexing very costly) to/from memory. Only some constants can be 20 bits, as they are encoded in the instruction using an extension word which includes 4 bits and the remaining 16 bits as for other instructions follow the OP-code. This is likely one of the last (small) architectures to show how much additional effort has to be done for address space expansion while maintaining compatibility.

Note that the MSP430X has some additional pitfalls for 20 bit addressing modes. For instance, interrupt handlers` have to reside in the lower 64KiB, as the vector table only contains 16 bit entries. This actually prohibits the vetor table to be defines in C as an array of function pointers (as they cannot be freely converted to any other function pointer and back).

edited Jul 19 '15 at 01:58

answered Jul 18 '15 at 20:10

too honest for this site

12,050
4
30
52

I'm not saying that this is the ABI, I'm saying that one *could* use an ABI like this. I do know that compilers for the MSP430X usually implement an ABI that does not use this kind of model. I'm more interested in whether an ABI like the one I outline is allowed or not. – fuz Jul 18 '15 at 20:13
I'm also downvoting this answer because it does not answer my question. – fuz Jul 18 '15 at 20:14
I removed the ABI part. However, you are mixing `int` and pointers in your question. – too honest for this site Jul 18 '15 at 20:16
As with your edit: There's no need to read or write 20 bits, the `int` type is stored in an area of 32 bits of which 12 are padding. As far as I know the MSP430X supports such storage operation, mostly because they are required to make the 20 bit pointers actually useful. – fuz Jul 18 '15 at 20:16
Where am I mixing `int` and pointers? The MSP430X architecture does not have dedicated pointer registers, my first paragraph just outlines why the architecture has a register size of 20 bits to provide some context for why such a weird ABI could be designed. – fuz Jul 18 '15 at 20:17
Did you actually read the user's guide? Along with the C standard and some thought about speed penalties and code-space, that should answer your question already. – too honest for this site Jul 18 '15 at 20:18
Also, you didn't remove the ABI part completely, the first paragraph remain and its content is actually wrong. The standard does not require `int` to have 16 bits, it just requires the range -32767 .. 32767 to be a subset of the valid range for `int`. One could without problem have a 20 bit `int` with a valid range of -524288 .. 524287. – fuz Jul 18 '15 at 20:19
1

Again, I'm not talking about what is actually done. I'm talking about the theoretical case of an ABI with a 20 bit `int`. Yes, it is possible to use an ABI with a `16` bit `int`, no, I'm not interested in such an ABI with respect to this question. – fuz Jul 18 '15 at 20:20
I'm in agreement with OP that this answer does not answer the question, but it is a very good point: even though it is allowed by the standard, having a 20-bit `int` that takes up 3 bytes with padding would be an atrociously bad implementation choice. – R.. GitHub STOP HELPING ICE Jul 18 '15 at 22:03
The question was whether padding bits are preserved. I don't see how this answers that question. It's probably a good answer to some other question. – Keith Thompson Jul 19 '15 at 00:46
@KeithThompson: Answer seeking for a question? Well, you are somewhat right, but I am in good company (The answer is 42 - what is the question?). Anyway, I edited my answer, hoping it now _does_ answer the question, and left the remainder for completeness, and because I think it might help OP. I'd appreciate if you re-read. – too honest for this site Jul 19 '15 at 01:35
*"From a more practical view, padding bits should be set to zero unless there are parity, etc. bits which depend on the other bits ..."* -- The standard doesn't support this. Other than the requirement that all-bits-zero is a representation of zero, an implementation might *require* some padding bits to be non-zero for at least some non-trap representations. For example, there might be a parity bit. – Keith Thompson Jul 19 '15 at 01:46
Re. your point 2: it says that all-bits-zero must represent `0`; not that `0` must only be represented by all-bits-zero. – M.M Jul 19 '15 at 01:47
@MattMcNabb: Yes, I was a bit too enthusiastic. The standard actually allows for other 0 representations, too. (One reason I'm really happy to work with 32bit CPUs currently is not to care about this kind of bit-twiddling). 16 bit CPUs should not have more than 64KiB address space. (yeah, an 8 bit CPUs ... - well :-) – too honest for this site Jul 19 '15 at 02:01

Do padding bits need to be preserved?

2 Answers2