24

README

A "trap value", or "trap representation" for type T, is a bit combination (of the underlying storage) that yields an invalid value of T. Trying to interpret the representation of an invalid value will cause undefined behavior.


Let the battle begin..

Another question has started a heated discussion regarding char, and the possibility of an implementation having trap representations for it.

Question:

  • Can char possibly have trap values?

Quotes that has been mentioned in the previous discussion:

These sections are the most quoted ones during the previous argumentation, are they contradicting?

3.9.1p1 Fundamental types [basic.fundamental]

It is implementation-defined whether a char can hold negative values. Characters can be explicitly declared signed or unsigned.

A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.11); that is, they have the same object representation. For character types, all bits of the object representation participate in the value representation.

For unsigned character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types.

In any particular implementation, a plain char object can take on either the same values as a signed char or an unsigned char; which one is implementation-defined.

3.9p2 Types [basic.types]

For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char.

If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.

Community
  • 1
  • 1
Filip Roséen - refp
  • 62,493
  • 20
  • 150
  • 196
  • Trap Values: http://stackoverflow.com/questions/6725809/trap-representation/6725981#6725981 – QuestionC Jun 04 '14 at 10:14
  • From the C++11 Standard "An unsigned char object with indeterminate value allocated to a register might trap." - if that's the case, then I'd expect it to be true of `char` and `signed char`. – Tony Delroy Jun 04 '14 at 10:27
  • Filip, can we consider AddressSanitizer/MemorySanitizer (optional feature in modern LLVM and GCC) as confirming language implementation? The Sanitizers adds some tags to every memory word (stored in separate memory, modelled after Soviet Elbrus architecture tags, similar idea in some large Burroughs), and there are traps for some values of tags, for example, reading of uninitialized memory. – osgx Jun 04 '14 at 10:30
  • The C++11 Standard also explicitly lists `numeric_limits<>` specialisations for `unsigned`-, `signed`- and "unspecified" `char`, and there's a `static constexpr bool traps` member so you can check at compile time, assert if you care etc.. – Tony Delroy Jun 04 '14 at 10:32
  • Can the down-voters please explain why this question is attracting **-1**? – Filip Roséen - refp Jun 04 '14 at 11:56

1 Answers1

5

The standard tells us there must be:

  • char, signed char, unsigned char, all the same size
  • the sizeof(char) is 1
  • char has at least 8 bits
  • every bit combination is meaningful and valid
  • array of char is packed (or behaves is if it is).

There isn't much wiggle room.

Nevertheless there are suggestions that during certain kinds of operations such as loading uninitialised memory or conversions as trap might occur.

Yes, I think an implementation could have a trap representation where trap values could occur as a result of some kind of undefined or unspecified behaviour, including evaluating expressions that involve unspecified/uninitialised values. The actual bit pattern leading to a trap value would be invisible to the implementation.

Such a CPU could have 9 bit bytes where only 8 bits are visible to the compiler and runtime, and the 9th bit is used to detect uninitialised memory, and will trigger a trap if loaded by (unprivileged) instructions.

david.pfx
  • 10,520
  • 3
  • 30
  • 63
  • AddressSanitizer is software emulator of such CPU, it has additional tag bits for every memory address. – osgx Jun 06 '14 at 11:14
  • 2
    @osgx: Yes, I thought it might. I worked on Burroughs large systems which tagged words not characters. On a tagged architecture a trap value of this kind seems possible to me. – david.pfx Jun 06 '14 at 11:44
  • 1
    The original IBM PC *did* have 9-bit bytes, where only 8 bits were generally available to the CPU; a memory store would generally write the 9th bit as a parity bit for the other 8, and a memory read would normally trigger a non-maskable interrupt if the 9th bit didn't match the parity of the other 8, but there were from what I understand some rarely-used diagnostic registers which would change that behavior. – supercat Jan 14 '15 at 21:21