4

There have been several discussions on whether accessing uninitialized variables yields undefined behaviour (e.g. in this SO answer), and I've looked through this online C11 draft standard concerning indeterminate values and undefined behaviour, too.

What I found on SO and in the standard (maybe I've overlooked something), undefined behaviour when accessing uninitialized variables is related to trap representations or to the possibility of having an (implicit) register storage class.

But what if the respective variable is an array (which cannot get register storage class), and the datatype must not have a trap representation (like character type according to 6.2.6.1p5 )?

Is then accessing such a value still UB?

int main () {
    char output[10];
    for (int i=0; i<10; i+= 2) {  // initializing every 2nd element only
        output[i] = '0' + i;
    }
    char c = output[1]; // accesses something "uninitialized"; But is it UB?
    printf("%c\n", c);  // prints probably garbage; But what if I don't care?
    return 0;
}
Stephan Lechner
  • 34,891
  • 4
  • 35
  • 58
  • "What if I don't care that I'm accessing garbage?" Would you care if you knew that "garbage" could contain potentially sensitive information that hasn't been overwritten? – Patrick Roberts Aug 21 '17 at 22:21
  • @Patrick Roberts: agree from a practical perspective; yet it's a question about (formal) UB... – Stephan Lechner Aug 21 '17 at 22:24
  • I think your analysis of the situation is correct. It is not UB, and if you find any unexpected behavior you have a pretty strong case for a compiler bug. – Petr Skocik Aug 21 '17 at 22:25
  • @PSkocik it is an UB – 0___________ Aug 21 '17 at 22:27
  • 5
    @PeterJ OK, then make it an answer where you explain why, and back it up with quotes from the standard. I will gladly upvote it. – Petr Skocik Aug 21 '17 at 22:28
  • It was answered already 10000000000 times. Same as Dereferencing unitilialised auto pointers. In this case UB may be more visible when core gets dumped. Or char a[10]; printf("%s",a); – 0___________ Aug 21 '17 at 22:32
  • 1
    @PeterJ Then why not link one of those 10000000000 times where it was answered? – interjay Aug 21 '17 at 22:33
  • 1
    @PeterJ Uninitialized auto pointer usually can have been declared `register`. The array in this piece of code can't, and that makes the difference, as far as I understand – Petr Skocik Aug 21 '17 at 22:33
  • I afraid you do not. – 0___________ Aug 21 '17 at 22:35
  • 3
    @PeterJ: This particular question has not been answered many times. In fact, even simpler versions of this question using scalars instead of arrays have been answered but the answers are contradictory (some say UB, others say not UB, both look plausibly correct to regular humans). – John Zwinck Aug 21 '17 at 22:44
  • This specific question also has to be considered in light of [**C11 Standard (draft n1570) § 6.5.2.1 Array Subscripting (4-explanation)**](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) where `output` is first converted to *pointer to int* and *accessed* through `*(output + 1)` (or `output[1]`) which is uninitialized then, read together with [**§ 6.3.2.1 Lvalues, arrays, and function designators (2)**](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) would lead to the conclusion it is *Undefined Behavior*. – David C. Rankin Aug 22 '17 at 00:12
  • @PeterJ [Here](https://stackoverflow.com/questions/45697843/why-do-i-get-an-endless-loop-from-my-code/45697931?noredirect=1#comment78356243_45697931) you were arguing that it is __not__ UB. Can you decide on one? – Ajay Brahmakshatriya Aug 22 '17 at 04:00
  • 1
    @David C. Rankin: But 6.3.2.1p2 still requires "...that could have been declared with the register storage class" to invoke UB, right? – Stephan Lechner Aug 22 '17 at 06:29
  • If I was an evil compiler writer, obsessed with exploiting UB for performance gains, I would argue this was UB based on the fact that arrays can legally be declared as `register` (even though this is pointless in practice, because accessing the elements becomes UB) and storing array elements in registers [is an actual optimization compilers perform](https://stackoverflow.com/q/17342881/4137916). Even if your array isn't explicitly declared as such, because it *could* have been declared as such. – Jeroen Mostert Aug 22 '17 at 13:07
  • 1
    @StephanLechner, I see what you are saying, but I think the *register storage class* verbiage is a bit awkward. Reading ¶2, arrays are excluded from all but "*If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.*" The *lvalue* assigned being type `int` could have been declared *registered* and being uninitialized is UB. Clear as mud... – David C. Rankin Aug 22 '17 at 18:54
  • 1
    Another telling comment is [**C11 Standard (draft n1570) § 6.7.1 (6 & comment 121)**](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) regarding arrays declared with the *register* storage-class specifier. While implementation defined, there is nothing preventing declaring an array with the *register* storage-class specifier, making all arrays with automatic storage fall into the "*could have been declared with the register storage class*" category, further suggesting that access to *any* uninitialized value is *Undefined Behavior*. Still clear as mud... – David C. Rankin Aug 22 '17 at 21:46
  • @David C. Rankin: Indeed, clear as mud. Arrays as registers seem to be possible, turning any attempt to read from an uninitialized array into (possible) UB; But this would also mean that it were not allowed to take the address of this array then, which is a very common pattern in programs: `char input[100]; scanf("%99s",input)`, right? – Stephan Lechner Aug 22 '17 at 21:58
  • @David C. Rankin: If I understand you right, another argument is that "an intermediate lvalue" like `output[1]` (i.e. `*(output+1)`) could be understood as an "object of automatic storage duration that could have been declared with the register storage class" ... and is not initialized. But I think that it would get initialized, yet with an indeterminate value? Thereby the effect that microcontrolers have problems with uninitialized registers (which is the root cause of defining this as UB, I think) would not become apparent, right? – Stephan Lechner Aug 22 '17 at 22:02
  • That's how I read it. You can declare an array with the *register* specifier, but then the only operators that can be applied are `sizeof` and `_Alignof`. The people writing the standard are obviously not linguists and more likely programmers and engineers (which explains the tortured readability). It would be a hell of a lot clearer if they just included the fact that *register* can be applied to an array with automatic storage. I think that would make the rest more clear. And, the fact that *register* is just a *"suggestion to"* the compiler which is handled in an implementation defined way – David C. Rankin Aug 22 '17 at 22:02
  • @David C. Rankin: And I think all your input is eligible for an answer :-) – Stephan Lechner Aug 22 '17 at 22:07
  • Yes, I thought about that, but then your really have to ask yourself "Do I really feel like playing Russian Roulette with all the 'standards experts' on a language lawyer question?" `:)` I'd feel far more comfortable if the question pertained to the Texas Rules of Civil Procedure as opposed to interpretation of the C11 standard `:)` I'll put an answer together and out after dinner, then we will await the assault... – David C. Rankin Aug 22 '17 at 22:12

2 Answers2

3

This type question and discussion is always a challenge, because it requires the interpretation of the C-standard, which in many aspects, isn't written for clarity, but is more the result of deliberation and compromise between what two (or more) competing factions will agree to include in it. After having gone through it a number of times, it is clear that far more discussion went into what to, or not to, include than ever went into where or how to include it in the standard for readability.

Continuing from the comments, I think we all can agree, based on the number of times it has been referenced in the comments and answers, that C11 Standard (draft n1570) § 6.3.2.1 Lvalues, arrays, and function designators (¶2)) applies.

"If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined."

(emphasis mine)

The issue becomes, "Is an array with automatic storage something that could have been declared with the register storage class?"

At first look, the obvious thought is "An array with a register storage-class specifier? That would be pretty dumb, you can't take the address, how would you ever access the values?" Given § 6.2.5 Types (comment 36)) "The address of such an object is taken implicitly when an array member is accessed."

First thoughts are often wrong, because arrays with automatic storage allow the use of the register storage class. § 6.7.1 (6 & comment 121)

The following code is perfectly legal -- while arguably not that useful.

#include <stdio.h>

int main (void) {

    register int a[] = { 1, 2, 3, 4 };
    register size_t n = sizeof a / sizeof (int);

    printf ("n : %zu\n", n);

    return 0;
}

The only operators that can be applied to an array declared with storage-class specifier register are sizeof and _Alignof. (See § 6.7.1 (comment 121)

Given the above, and given any uninitialized element in the array "is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined."

In your specific case:

    char c = output[1]; // accesses something "uninitialized"; But is it UB?

output[1] designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • "The address of such an object is taken implicitly when an array member is accessed." is comment 36 in my draft, not 31. But it's also irrelevant, because "such an object" refers to "a non-lvalue expression with structure or union type, where the structure or union contains a member with array type", which is clearly not relevant here. (An "array member" is a member of array type, not an array *element*.) – Jeroen Mostert Aug 23 '17 at 14:05
  • Yep typo on the number. Fixing. Whether you call an element of an array an *element* or *member* of the array is largely a matter of taste. Both unambiguously identify one of the *things* that make up the array. – David C. Rankin Aug 23 '17 at 17:30
  • I didn't mean to imply you couldn't call an array element an array member. I meant to say that's *not* how the standard uses this term. See every other time it uses the words "member" and "element". (Also, the context makes it clear "member" here means a member of a structure or union, regardless of anything else.) – Jeroen Mostert Aug 23 '17 at 17:34
  • Point taken, *§6.3.2.1 (3)* would be the better citation as comment 36 is actually attached to the Storage Duration from the page prior and agreed refers to the array as a struct member. – David C. Rankin Aug 23 '17 at 18:38
  • 1
    I'm failing to find the long discussion I had about this a couple months ago in another iteration of this question — it's probably been "moved to chat" over my explicit objections to that bad practice, rendering it unsearchable — but the gist was that _what C compilers actually do_ is treat the bald statement in J.2 "The behavior is undefined if [...] a value is used while it is indeterminate" as normative, even though Annex J isn't normative. This applies to all values, however declared or allocated and whether or not they have trap reps, and even to `unsigned char`. – zwol Aug 24 '17 at 14:30
  • 1
    ... And I would bet a box of donuts that compiler vendors would claim that that bald statement expresses the true intent of the committee and, to the extent that the normative text does not agree, the standard is defective. – zwol Aug 24 '17 at 14:32
  • 1
    And you should get a 1/2 dozen cups of coffee to go with the donuts as I think your bet is safe. When you boil it all down, it comes back to the simple rule that attempt to access an uninitialized value is undefined behavior -- regardless of the standards gymnastics you have to go through to build a concise citation to that fact for an array element. – David C. Rankin Aug 24 '17 at 20:31
  • This answer appears to reason that because an array can be declared with `register`, as it shows in “The following code is perfectly legal”, then, in `char c = output[1];`, “`output[1]` designates an object of automatic storage duration that could have been declared with register storage class…” However, while declaring the `output` with `register` “could have been” done in different code, it cannot when `output[1]` is used. And this “could have been” by the C standard means “could have been” given the code at hand, not code with the uses of the object removed… – Eric Postpischil Jun 14 '21 at 10:01
  • … E.g., in the code `char a; &a; char x = a;`, `a` could not have been declared `register`, because its address is taken. For the purposes of 6.3.2.1 2, we do not imagine that there “could have been” different code `register char a; char x = a;`. The question is whether there “could have been” code `register char a; &a; char x = a;`. Similarly, in `char c = output[1];`, `output` cannot have been declared `register`. If it had been, then `output[1]` would not be “legal” (would not have behavior defined by the C standard). – Eric Postpischil Jun 14 '21 at 10:03
2

From the C Standard (6.2.6 Representations of types, 6.2.6.1 General)

5 Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined....

So for character arrays there is no undefined behavior.

Objects of character types do not have a trap representation.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335