26

I saw the following example on cppreference.com

int x;     // OK: the value of x is indeterminate
int y = x; // undefined behavior

Here, int y = x; is undefined behavior because x is uninitialized.

But,

unsigned char c;     // OK: the value of c is indeterminate
unsigned char d = c; // OK: the value of d is indeterminate

Here, unsigned char d = c; is indeterminate behavior, but unsigned char c; is also an uninitialized variable.

So, Why is the value of unsigned char d indeterminate?

divibisan
  • 11,659
  • 11
  • 40
  • 58
msc
  • 33,420
  • 29
  • 119
  • 214
  • `int` is a type which can have trap representation and the address of `x` was never taken., – Sourav Ghosh Aug 04 '17 at 12:10
  • 2
    It is explains this in the box just above the example. Is there a particular part of that which doesn't make sense? –  Aug 04 '17 at 12:10
  • 3
    @SouravGhosh pardon me if I am missing something, how does taking address of x change anything? – Ajay Brahmakshatriya Aug 04 '17 at 12:12
  • @AjayBrahmakshatriya [this helps?](https://stackoverflow.com/q/11962457/2173917) – Sourav Ghosh Aug 04 '17 at 12:14
  • @SouravGhosh even if x has address taken, it can start in a register and can be spilled into memory before the statement that has `&x`. So I think it applies in that case too. I am not sure. What do you think? – Ajay Brahmakshatriya Aug 04 '17 at 12:21
  • 1
    @rsp Are you asking _why_ `unsigned char` is different, or are you just asking for a plain statement of the difference, such as that given by JETM? This is actually quite an unclear question at the moment. – underscore_d Aug 04 '17 at 12:22
  • @AjayBrahmakshatriya My comment was mostly targeted towards the first snippet, in case `x` has address taken, it's be indeterminate, not UB. – Sourav Ghosh Aug 04 '17 at 12:22
  • 1
    @SouravGhosh yes, same. It too can be UB too. Even if x has address taken of, it can be in a register till the `&x` and in that case the register can be in "unitialized" state which will be a trap representation. – Ajay Brahmakshatriya Aug 04 '17 at 12:25
  • @AjayBrahmakshatriya Yes, all the wording indicates that it's the value that exhibits UB or not; the address or lack thereof does not appear to be relevant at all. – underscore_d Aug 04 '17 at 12:26
  • But now wondering myself - if the problem is about registers - why can't unsigned char also have this register state, such that the exception is possible??? – Aconcagua Aug 04 '17 at 12:27
  • 2
    @Aconcagua perhaps because `unsigned char` is guaranteed to not have a trap representation. And if the architecture has the register issue, it is up to the compiler to initialize the register with "Something" (non trap) if it is using it for char. – Ajay Brahmakshatriya Aug 04 '17 at 12:28
  • @AjayBrahmakshatriya Then why not for all unsigned types? Never heard of that any unsigned type has trap representations... – Aconcagua Aug 04 '17 at 12:29
  • 1
    [This](https://stackoverflow.com/questions/14935722/does-initialization-entail-lvalue-to-rvalue-conversion-is-int-x-x-ub) doesn't explicitly answer the question but it does contain the answer. `unsigned char` is given special rules in lvalue to rvalue conversion and in this case it is believed that such a conversion is applied. – NathanOliver Aug 04 '17 at 12:32
  • 1
    @Aconcagua other unsigned types usually don't have trap representations. But the standard make such a provision. If you are asking why the standard didn't make such a provision, I am not sure. Perhaps there is nothing to gain from it. char is required because one type is required to have such a property for byte wise copy etc. – Ajay Brahmakshatriya Aug 04 '17 at 12:32
  • Note the "(Since C++14)". – molbdnilo Aug 04 '17 at 12:33
  • There are no trap representations in C++. That's a C thing – M.M Aug 04 '17 at 12:59
  • 1
    @M.M: There may be trap representations in C++, but only in programs with undefined behavior. (If you give me a rule that says there are no trap representations, I remind you that rule does not apply during UB) – Ben Voigt Aug 04 '17 at 16:42
  • @BenVoigt by "C++" I mean "standard C++". The behaviour of programs with UB is not covered by the standard – M.M Aug 04 '17 at 23:50
  • That's my point. You can't say standard C++ has trap representations. You also can't say it does not. You only know it does not define their behavior. – Ben Voigt Aug 05 '17 at 00:00
  • @BenVoigt: Using a character pointer to populate a PODS, and then reading elements out using their defined types, will yield defined behavior in cases where the bit patterns represent valid values for their respective types, and Undefined Behavior in cases where they do not. What terminology would you suggest to describe the latter situation, and the fact it can only occur with types for which some bit patterns aren't valid? – supercat Aug 25 '17 at 15:44
  • Also see [Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++14?](https://stackoverflow.com/q/23415661/1708801) in particular the comments – Shafik Yaghmour Jan 26 '18 at 17:24

4 Answers4

26

Online references like cppreference.com are good up to a point. But it is known that sometimes errors or misinterpretations do occasionally slip through. So when dealing with such oddities, it is always a good thing to go to the official C++ standard.

N3936

§8.5 Initializers [dcl.init]

12 [...] When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value , and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17). [...] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

  • If an indeterminate value of unsigned narrow character type (3.9.1) is produced by the evaluation of

    • [...]

    • the operand of a cast or conversion to an unsigned narrow character type (4.7, 5.2.3, 5.2.9, 5.4)

    • [...]

    then the result of the operation is an indeterminate value.

  • If an indeterminate value of unsigned narrow character type is produced by the evaluation of the right operand of a simple assignment operator (5.17) whose first operand is an lvalue of unsigned narrow character type, an indeterminate value replaces the value of the object referred to by the left operand

  • If an indeterminate value of unsigned narrow character type is produced by the evaluation of the initialization expression when initializing an object of unsigned narrow character type, that object is initialized to an indeterminate value.

Example:

int f(bool b) {
  unsigned char c;
  unsigned char d = c; // OK, d has an indeterminate value
  int e = d; // undefined behavior
  return b ? d : 0; // undefined behavior if b is true
}

So (to my big surprise) the standard backs this up.

As for why, the most likely reason can be also found in the standard:

§3.9.1 Fundamental types [basic.fundamental]

1 [...] For unsigned narrow character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types


As a side note, I just realized this can be used by an evil interviewer:

Q. Can you in a well-defined behavior change the valid value of an object to an undetermined value? If yes, how?

A.

unsigned char ind;
unsigned char x = 24;
x = ind; // x had a valid value, now x has an indetermined value
bolov
  • 72,283
  • 15
  • 145
  • 224
  • According to the rationale, all bit patterns for `unsigned char` are valid, so why does `int e = d;` have to be undefined, as opposed to an unspecified value in the range of unsigned char? – M.M Aug 04 '17 at 13:31
  • 1
    @M.M maybe because evaluating `d` when d has an indeterminate value is U.B. as it may be a bit pattern that is not a valid value representation for it's type. Maybe they just wanted to keep it somewhat simple(r). We can just speculate unless an official justification surfaces or at least a discussion by standard committee members. – bolov Aug 04 '17 at 13:35
  • 1
    @M.M e.g. a fictional platform. This platform has a strange encoding of `int` where there is a invalid bit pattern for `int`. Conversion from `int` to `char` is done by a fictional instruction `movic`. On this platform `movic` issues a hardware exception if the source (the int) has an invalid value. This platform could theoretically exist and would theoretically be supported by the standard. Thus the need for UB. I know it's a far stretch, but C++ is notorious for wanting to support fictional esoteric hardware. – bolov Aug 04 '17 at 13:42
  • @M.M I misread. you are talking about unsigned char to int. I was talking about the reverse, but I think the principle can still apply – bolov Aug 04 '17 at 14:00
  • 4
    @bolov that fictional platform's name was Itanium. Raymond Chen had a blog post ["Uninitialized garbage on ia64 can be deadly"](https://blogs.msdn.microsoft.com/oldnewthing/20040119-00/?p=41003/) about that – Cubbi Aug 04 '17 at 14:21
  • 1
    For your evil interview question, I can just do it by placement new. – T.C. Aug 04 '17 at 17:05
  • 1
    Might want to note that the clause you quote is only applicable C++14 and later. Before that, accessing the value of an uninitialised `unsigned char` was formally undefined. – Peter Aug 06 '17 at 23:15
  • @Cubbi: No need for Itanium. The simplest straightforward implementation of `uint16_t volatile q; uint16_t foo(uint32_t x, uint32_t y) { uint16_t result; if (mode) result=q; return result; }` on the ARM would return `mode` if `q` is zero, without clipping it to the range 0-65535. The Standard would have no way of allowing a `uint16_t` function to return a value outside the range 0-65535 in the mode==0 case except by saying that such a case would invoke UB. – supercat Sep 17 '17 at 18:42
  • Also see [Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++14?](https://stackoverflow.com/q/23415661/1708801) especially the comments – Shafik Yaghmour Jan 26 '18 at 17:25
18

From the page you referenced: assigning from an indeterminate value is undefined behavior except

If the indeterminate value of unsigned narrow character type or std::byte is assigned to another variable with unsigned narrow character type or std::byte (the value of the variable becomes indeterminate, but the behavior is not undefined)

I believe this is because default initialization may place any combination of bits into the variable and while the standard guarantees that an unsigned narrow character type may take on values represented by every possible bit pattern, there is no such guarantee for other types.

joshwilsonvu
  • 2,569
  • 9
  • 20
  • 5
    In particular, this reference; http://eel.is/c++draft/basic.fundamental#1 _For unsigned narrow character types, each possible bit pattern of the value representation represents a distinct number. These requirements do not hold for other types._ – Niall Aug 04 '17 at 12:39
  • @Niall: Unsigned character types are guaranteed not to have padding bits or trap representations, and are the only types with those guarantees. That makes them the only types that can *always* offer certain other guarantees. Unfortunately, the Standard fails to say anything about whether quality implementations where other types which have no padding bits or trap representations, and thus could extend the other guarantees to those types, should or should not be expected to do so. – supercat Aug 25 '17 at 15:48
1

From the linked page:

Use of an indeterminate value obtained by default-initializing a non-class variable of any type is undefined behavior [...] except in the following cases:

...

if the indeterminate value of unsigned narrow character type or std::byte is used to initialize another variable with unsigned narrow character type or std::byte;

unsigned char is an unsigned narrow character, so this is one of the exceptions where UB does not occur.

Community
  • 1
  • 1
  • 5
    We have to read between the lines on this one, but I think the question is rather _why_ `unsigned char` is different from others in this regard, not just the statement that it is. – underscore_d Aug 04 '17 at 12:21
  • Also see [Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++14?](https://stackoverflow.com/q/23415661/1708801) especially the comments – Shafik Yaghmour Jan 26 '18 at 17:29
1

Two related useful features of C, which carried into C++, are:

  1. Objects can be copied by copying all of the individual bytes contained therein.

  2. Structure-type objects can be safely copied in their entirety even when some of the objects therein do not hold defined values, provided that no attempt is made to read the undefined portions or copies thereof outside the context of whole-structure copying or individual-byte access.

On most platforms, there's no particular reason why the same guarantees could not and should not be extended to other types as well, but the authors of the C Standard only sought to define guarantees that should be applicable on all platforms, and the authors of the C++ Standards have simply followed the C++ behaviors.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 1
    the question is tagged c++ – bolov Aug 06 '17 at 22:07
  • @bolov: The features I described existed in C long before C++ was conceived much less standardized. I see no reason to believe that the behavior is defined in C++ for any reason other than that it was defined that way in C first. That would in turn imply that the reasons it's defined in C are the same reasons it's defined in C++. – supercat Aug 06 '17 at 22:15