6

Various esteemed, high rep users on SO keeps insisting that reading a variable with indeterminate value "is always UB". So where exactly is this mentioned in the C standard?

It is very clear that an indeterminate value could either be an unspecified value or a trap representation:

3.19.2
indeterminate value
either an unspecified value or a trap representation

3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.

3.19.4
trap representation
an object representation that need not represent a value of the object type

It is also clear that reading a trap representation invokes undefined behavior, 6.2.6.1:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

However, an indeterminate value does not necessarily contain a trap representation. In fact, trap representations are very rare for systems using two's complement.

Where in the C standard does it actually say that reading an indeterminate value invokes undefined behavior?

I was reading the non-normative Annex J of C11 and found that this is indeed listed as one case of UB:

The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.9, 6.8).

However, the listed sections are irrelevant. 6.2.4 only states rules regarding life time and when a variable's value becomes indeterminate. Similarly, 6.7.9 is regarding initialization and states how a variable's value becomes indeterminate. 6.8 seems mostly irrelevant. None of the sections contains any normative text saying that accessing an indeterminate value can lead to UB. Is this a defect in Annex J?

There is however some relevant, normative text in 6.3.2.1 regarding lvalues:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

But that is a special case, which only applies to variables of automatic storage duration that never had their address taken. I have always thought that this section of 6.3.2.1 is the only case of UB regarding indeterminate values (that are not trap representations). But people keep insisting that "it is always UB". So where exactly is this mentioned?

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Is this perhaps yet another case where people mix up the C and C++ standards? Is C++ perhaps more explicit? – Lundin Nov 14 '16 at 08:59
  • 1
    I wonder a bit about the last citation. So, just by taking the address of a variable we can change UB magically to legal behaviour? Doesn't make much sense to me. – too honest for this site Nov 14 '16 at 09:21
  • 1
    Here is a nice explanation: http://stackoverflow.com/a/25074258/4082723 – 2501 Nov 14 '16 at 09:25
  • 1
    Another possible duplciate: https://stackoverflow.com/questions/37204530/is-using-any-indeterminate-value-undefined-or-just-those-stored-in-objects-with – Theodoros Chatzigiannakis Nov 14 '16 at 09:47
  • @TheodorosChatzigiannakis That doesn't answer the question. – Lundin Nov 14 '16 at 10:15
  • Please be a little more specific. Do you perceive it as a different question or do you perceive the answer presented there as inadequate? – Theodoros Chatzigiannakis Nov 14 '16 at 10:17
  • @Lundin You have pretty much quoted everything related to this from the standard. I don't know what else you wanted as a "proof". What is it in your question that's not answered by any of the 3 linked dups ? – P.P Nov 14 '16 at 10:20
  • @TheodorosChatzigiannakis That question contained no answers which cited the relevant part of the standard saying that reading an indeterminate value invokes UB. (Ironically, your question contains a long answer by yours sincerely which was deleted since I misunderstood the question slightly, but essentially that answer contains everything of the above.) – Lundin Nov 14 '16 at 10:21
  • @P.P. I suspect that the answer will be, "it is not always UB". Which I've kept saying all over SO, then get down-voted by people saying "it is always UB". I'm getting tired of that knee-jerk reaction. Therefore I posted this question. – Lundin Nov 14 '16 at 10:25
  • 3
    @Lundin It's probably said as "always UB" because there's no safe way to say some indeterminate value is not a trap representation without going into specific implementation details or giving allowance for specific types (that can't have trap representation). So, IMO, it's always best to treat it as UB even though it's not entirely accurate to say so (I mean what else can you safely do with an indeterminate value?). To your question, there's "nothing more in the C standard regarding this" is the answer as you already quoted everything. – P.P Nov 14 '16 at 10:34
  • @P.P. The thing is, trap representations barely exist on any mainstream systems. At least I have never worked with such a system myself. It seems to mainly be a thing of one's complement and sign & magnitude systems. Then it is not helpful at all to have an "its always UB" attitude, when in reality, there will just be a harmless, unspecified value. I think C++ is different, since it states things like code containing UB may be optimized away. Perhaps that's where the SO bandwagon attitude is coming from. – Lundin Nov 14 '16 at 10:39
  • @Lundin I see. It is very likely though that if the correct answer really is *"not every usage of an indeterminate value invokes undefined behavior"*, then there might not be any specific places in the standard where it's spelled out. You seem to have found all the paragraphs that are relevant to your question already, so the answer could be just *"you're right, as per your citations"*. – Theodoros Chatzigiannakis Nov 14 '16 at 12:25
  • @TheodorosChatzigiannakis That might be true, though I'll leave this question open for a week or so, just in case I have missed/misunderstood something. – Lundin Nov 14 '16 at 12:29
  • The posted question is off-topic for stackoverflow. Suggest the language lawyer site. – user3629249 Nov 16 '16 at 07:40
  • @user3629249 That's the dumbest thing I've heard. Programming questions are now off-topic for SO? Not because site policies say so, but because you say so? [Read this](http://stackoverflow.com/help/on-topic). – Lundin Nov 16 '16 at 07:43
  • this is NOT a programming question, it is asking for some detail about the C standard, which means it is asking for a tutorial on the C programming standard, which is NOT a programming question – user3629249 Nov 16 '16 at 08:17
  • @user3629249 Again, read the link I posted. My question is both "a specific programming problem" and "a practical, answerable problem that is unique to software development". Arguably, programming languages are also tools commonly used by programmers. Also, why do you think SO has a tag called language-lawyer? Because such questions are on-topic, perhaps? Point out the policy or meta post that labels such questions off-topic, or otherwise stop trolling. – Lundin Nov 16 '16 at 08:25
  • @user3629249 Since you know the term "language lawyer", you know that these kinds of questions have always been considered on-topic here. And it's not a tutorial-seeking question by any stretch -- rather, it asks for clarification regarding a specific matter from a specific document. – Theodoros Chatzigiannakis Nov 18 '16 at 09:36
  • Ok I've given this some time but none has proven the above assumptions wrong. I haven chosen to close this as a duplicate to [(Why) is using an uninitialized variable undefined behavior in C?](http://stackoverflow.com/questions/11962457/why-is-using-an-uninitialized-variable-undefined-behavior-in-c) and encourage everyone to use that question as the "canonical duplicate" for questions regarding indeterminate value. I'll post an additional answer to that question. – Lundin Nov 18 '16 at 10:20
  • [Answer posted here](http://stackoverflow.com/a/40674888/584518). I'll now go on a downvote-spree on any answers claiming "it is always UB" without stating why. – Lundin Nov 18 '16 at 10:36
  • @Lundin If you are not in a hurry to close it, please consider leaving it open for a week or two. I've actually started reading the C standard cover to cover in the hopes that I'll be able to confidently say whether it's in there or not. – Theodoros Chatzigiannakis Nov 18 '16 at 10:43
  • @TheodorosChatzigiannakis If you do come up with something, please nudge me and I'll re-open the question. – Lundin Nov 18 '16 at 10:44
  • Despite of being a duplicate, this is a good question. Came here from C11, J.2 Undefined behavior: "The value of a pointer to an object whose lifetime has ended is used (6.2.4)." – pmor Aug 01 '22 at 17:47
  • 1
    @pmor I ended up posting an answer [here](https://stackoverflow.com/a/40674888/5845189) below the linked duplicate. – Lundin Aug 05 '22 at 22:18

3 Answers3

1

As far as I know, there is nothing in the standard that says that using an indeterminate value is always undefined behavior.

The cases that are spelled out as invoking undefined behavior are:

  • If the value happens to be a trap representation.
  • If the indeterminate value is an object of automatic storage.
  • If the value is a pointer to an object whose lifetime has ended.

As an example, the C standard specifies that the type unsigned char has no padding bits and therefore none of its values can ever be a trap representation.

Portable implementations of functions such as memcpy take advantage of this fact to perform a copy of any value, including indeterminate values. Those values could potentially be trap representations when used as values of a type that contains padding bits, but they are simply unspecified when used as values of unsigned char.


I believe that it is erroneous to assume that if something could invoke undefined behavior then it does invoke undefined behavior when the program has no safe way of checking. Consider the following example:

int read(int* array, int n, int i)
{       
   if (0 <= i)
       if (i < n)
           return array[i];
   return 0;
}

In this case, the read function has no safe way of checking whether array really is of (at least) length n. Clearly, if the compiler considered these possible UB operations as definite UB, it would be nearly impossible to write any pointer code.

More generally, if the compiler cannot prove that something is UB, it has to assume that it isn't UB, otherwise it risks breaking conforming programs.


The only case where the possibility is treated like a certainty, is the case of objects of automatic storage. I think it's reasonable to assume that the reason for that is because those cases can be statically rejected, since all the information the compiler needs can be obtained through local flow analysis.

On the other hand, declaring it as UB for non-automatic storage objects would not give the compiler any useful information in terms of optimizations or portability (in the general case). Thus, the standard probably doesn't mention those cases because it wouldn't change anything in realistic implementations anyway.

Theodoros Chatzigiannakis
  • 28,773
  • 8
  • 68
  • 104
0

To allow the best blend of optimization opportunities and useful semantics, types which have no trap representations should have Indeterminate Values subdivided into three kinds:

  1. The first read will yield any value that could result from an unspecified bit pattern; subsequent would be guaranteed to yield the same value. This would be similar to "Unspecified value", except that the Standard doesn't generally distinguish between types which do and don't have trap representations, and in cases where the Standard calls for "Unspecified Value" it requires that an implementation ensure the value is not a trap representation; in the general case, that would require that an implementation include code to guard against certain bit patterns.

  2. Each read may independently yield any value that could result from an unspecified bit pattern.

  3. The value read, and the result of most computations performed upon it, may behave non-deterministically as though the read had yielded any possible value.

Unfortunately, the Standard doesn't make such distinctions, and there is some disagreement about what it calls for. I would suggest that #2 should be the default, but it should be possible for code to indicate all places where code needs to force the compiler to pick a concrete value, and indicate that a compiler may use #3-style semantics everywhere else. For example, if code for a collection of distinct 16-bit values stored as:

struct COLLECTION { size_t count; uint16_t values[65536], locations[65536]; };

maintains the invariant that for each i < count, locations[values[i]]==i, it should be possible to initialize such a structure merely by setting "count" to zero, even if the storage had previously been used as some other type. If casts are specified as always yielding concrete values, code which wants to see if something is in the collection could use:

uint32_t index = (uint32_t)(collection->locations[value]);
if (index < collection->count && collections->values[index]==value)
  ... value was found

It would be acceptable to have the above code arbitrarily yield any number for "index" each time it reads an item from the array, but it would be essential that both uses of "index" in the second line use the same value.

Unfortunately, some compiler writers seem to think compilers should treat all indeterminate values as #3, while some algorithms require #1 and some require #2, and there's no real way to distinguish the varying requirements.

supercat
  • 77,689
  • 9
  • 166
  • 211
0

3.19.2 permits implementation to be a trap representation, and both reading and writing are undefined behaviour.

Your platform may give you guarantees (e.g. that integer types never have trap representations) but that is not required by the Standard, and if you rely on that, your code loses some portability. That's a valid choice, but shouldn't be made in ignorance.

More systems have trap representations for floating-point types than for integer types, but C programs may be run on processors that track register validity - see (Why) is using an uninitialized variable undefined behavior in C?. This degree of latitude is the principal reason for C's wide adoption across many hardware architectures.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103