1

Say I have this code:

void foo() {
  char s[10];
  char v1 = s[0]; // UB
  char v2 = s[10]; // also UB
}

void bar() {
  char s[10];
  strcpy(s, "foo");
  char v3 = s[3]; // v3 is zero
  char v4 = s[0]; // v4 is 'f'
  char v5 = s[4]; // What?
}

As the address of s[0] to s[3] are accessed in strcpy, and that s[0] to s[9] are in continuous memory, I suppose the whole array should contain some value (including indeterminate).

Is the operation about v5 well-defined? Or v5 is only an indeterminate value (without tripping any UB)?

What if the array is of type int and still partially assigned?

iBug
  • 35,554
  • 7
  • 89
  • 134
  • `s[4]` has never been assigned, so we are in the same situation as on line `char v1 = s[0]; // UB`. `strcpy` assigns only `s[0]` to `s[3]` here. – Jabberwocky May 04 '18 at 11:37
  • I guess its well defined that s5/s[4] will contain value between -128 to 127, whereas which exact value will it hold is undefined – Pras May 04 '18 at 11:41
  • 2
    Possible duplicate of [(Why) is using an uninitialized variable undefined behavior?](https://stackoverflow.com/questions/11962457/why-is-using-an-uninitialized-variable-undefined-behavior) – Stargateur May 04 '18 at 11:42
  • Possible duplicate of [In C++, is accessing an uninitialized array unspecified behavior or undefined behavior?](https://stackoverflow.com/questions/49696810/in-c-is-accessing-an-uninitialized-array-unspecified-behavior-or-undefined-be) – Lanting May 04 '18 at 11:43
  • @Stargateur No, absolutely not a dupe. This question is about *whether* it's UB to access an array *that has been taken address of*. – iBug May 04 '18 at 11:43
  • TL;DR: as `char` is guarantee to not have trap representation value (if I remember correctly) the value is just indeterminate. So, well defined is a little too much I will say it's implementation defined and generally it's will just contain garbage value. – Stargateur May 04 '18 at 11:44
  • @iBug and that will lead to repeat what the dup say... sorry but I as expert tell you it's the same issue. – Stargateur May 04 '18 at 11:47
  • @Stargateur IIRC, the only type guaranteed to have no trap value is `unsigned char`, but not its signed version or unknown-signedness version. – iBug May 04 '18 at 11:53
  • @iBug If you right that just change that the behavior is implemented defined. Whatever it's up to you to look if your type have trap representation (and whatever this question look useless because just don't use initialized value ;)) – Stargateur May 04 '18 at 11:55
  • 2
    @Lanting: C questions do not duplicate C++ questions. The rules are different for the different languages, and the answers must be specific to each language, except when specifically asking about something common to both languages. – Eric Postpischil May 04 '18 at 11:59
  • 1
    @Stargateur: I believe character types may have trap representations, except `unsigned char`. However, the C 2011 paragraph that says reading an object that has a trap representation is undefined behavior, 6.2.6.1 5, specifically excludes character types. Thus, reading a character that has a trap representation is not said to be undefined. – Eric Postpischil May 04 '18 at 12:01
  • 2
    @iBug: (a) Character types are treated differently in the C standard. The answer to this question would be different if you used an array of `int`. (b) Your premise is flawed. `char v1 = s[0];` does not have undefined behavior. – Eric Postpischil May 04 '18 at 12:03
  • @EricPostpischil So you mean, *none* of the `vx` variables are UB. All is either well-defined or contains indeterminate values? – iBug May 04 '18 at 12:08
  • @EricPostpischil In fact, I'm not sure, as the address of `s` is never taken, `s[0]` is UB in `foo()`, I think. – Stargateur May 04 '18 at 12:16
  • 1
    It's undefined to access an uninitialized object if it could have been declared with the `register` storage class. Question is, how does that translate to arrays? It seems arrays are declarable with `register` but the J2 appendix says it's UB to have a register-classified array convert to its first member, but that kind of conversion is how indexing is defined, so it seems register-declared arrays are un-indexable, but since you are indexing, the array couldn't have been declared register, so the dereferences should yield unspecified values but not result in UB. It's weird, though. – Petr Skocik May 04 '18 at 12:19
  • @Stargateur: `s[0]` is defined by the C standard to be `(*)((s)+(0))`. `(s)+(0)` is the address of `s[0]`. It has an address, so it cannot be `register`. – Eric Postpischil May 04 '18 at 12:24
  • @iBug: Yes, the uninitialized elements have indeterminate values. And, for character types, reading them is not said by 6.2.6.1 5 to be undefined behavior even if they contain trap representations. – Eric Postpischil May 04 '18 at 12:26
  • @EricPostpischil Yes it's what I through in the first place but `s` is not the address of `s` aka `&s`... It's the array itself that you must take the address not one of his value address ;) but whatever PSkocik give a wonderful explanation that this is not UB because an array can't have register qualifier. – Stargateur May 04 '18 at 12:28
  • 1
    @Stargateur: Aside: As I read the standard, an array can have register qualifier, as long as you never use the array! Anything that lets it get converted to a pointer to its first element is undefined. But you could apply `sizeof` to it. – Eric Postpischil May 04 '18 at 12:31
  • @EricPostpischil That's how I'm reading it too. I wonder why even allow the register storage class there then. I can't think of any use case for it. – Petr Skocik May 04 '18 at 12:36
  • @Michi `s[] = { 'f', 'o, 'o', '\0' };`. `strcpy` copies the NUL terminator. This is clearly stated in the documentation. – Jabberwocky May 04 '18 at 16:10
  • @Michael Walz I deleted my comment, because I misunderstood yours that’s all:)). I know what strcpy does. – Michi May 04 '18 at 19:30
  • 1
    @PSkocik: I don't think the authors of the Standard wanted to forbid implementations that could usefully keep a small array in a register (e.g. using a combination of variable shifts and/or bitfield insert/extract instructions) from letting programmers use the `register` keyword to request such storage. On the flip side, they also didn't want to require that implementations accept such code. The normal way for the Stanard to describe features that implementations might support, but aren't required to, is to classify those features as Undefined Behavior. – supercat May 04 '18 at 22:06
  • What do you mean by "accessed in strcmp"? There's no strcmp in this code – M.M May 04 '18 at 23:08
  • @supercat Thanks. I'm always forgetting about those implementation-defined well-behaved UB situations – Petr Skocik May 05 '18 at 14:24
  • @PSkocik: The problem, fundamentally, is that the authors of the Standard never thought it necessary to mandate things that seemed like common sense at the time. If an implementation targets an unusual platform where storage that has never been written sometimes acts strangely, one shouldn't expect that the implementation would protect one from that. That does not, however, mean that implementations that don't target such platforms should go out of their way to exploit the Standard's "permission" to do likewise. – supercat May 05 '18 at 16:52

3 Answers3

3

It can't be undefined because the char there might have a trap representation, because 6.2.6.1p5 says that accessing anything with a character type is well defined.

It could be undefined because of 6.3.2.1p2

An lvalue designating an object of automatic storage duration that could have been declared with the register storage class is used in a context that requires the value of the designated object, but the object is uninitialized.

so the question is, could the array have been declared with the register storage class?

The answer to that is no, it couldn't have, because you're indexing it. Indexing is defined according to 6.5.2.1p2

(

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero). )

in terms of the array coverting to the address of its first element, but for a register-classified array, such conversion would have been undefined as per bullet point:

An lvalue having array type is converted to a pointer to the initial element of the array, and the array object has register storage class (6.3.2.1).

in appendix J.2 Undefined behavior, which means the array couldn't have been declared register.

Footnote 121 in 6.7.1 Storage class specifiers further elaborates this:

the address of any part of an object declared with storage-class specifier register cannot be computed, either explicitly (by use of the unary & operator as discussed in 6.5.3.2) or implicitly (by converting an array name to a pointer as discussed in 6.3.2.1). Thus, the only operators that can be applied to an array declared with storage-class specifier register are sizeof and _Alignof

(In other words, while the language allows register arrays, they're essentially unusable).

Consequently, code like:

char unspecified(void){ char s[1]; return s[0]; }

will return an unspecified value but will not render your program's behavior undefined.

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • This doesn't address at all the `strcpy` issue ... OP seems to think that there might be a difference between `v[0]` in the first case, and the bytes of the array after the end of the string in the second case, so you should cover whether or not those two situations are actually identical – M.M May 04 '18 at 23:11
0

The authors of the Standard did not think that it was necessary to explicitly describe corner cases which every compiler to date had consistently handled the same way, and for which they saw no reason why any implementation might behave differently if its designer wasn't being deliberately obtuse. Scenarios involving partially-written aggregates fall into this category.

The behavior of array subscripting is defined as taking the address of the array, performing arithmetic on the resulting pointer, and then accessing the resulting address. Personally I think it should be defined as a separate kind of operation with slightly different corner cases from explicitly taking an array's address, doing the pointer arithmetic, and casting the result, but the Standard defines the operation in terms of those steps. As such, a compiler that is not being deliberately obtuse should regard an array which is accessed using the subscript operator as an object whose address is taken, and which may be thus be accessed whether or not it has been written. That does, however, still leave open a question about the behavior of such code.

Assuming "unsigned char" is 8 bits and "unsigned" is 24 or more, what values could the following return:

unsigned test1(unsigned char *p)
{
  unsigned x=p[0];
  unsigned y=p[0];
  unsigned z=y;
  return x | (y << 8) | (z << 16);
}
unsigned test(void)
{
  unsigned char foo[1];
  return test1(foo); // Note that this takes the address of 'foo'.
}

Personally, I doubt there would be any real disadvantage to requiring that code generated for test1 must behave as though x, y and z all hold the same value in the range 0..255, or--at absolute minimum--behaving as though y and z hold the same value. I don't think the authors of the Standard would have expected that any non-obtuse implementation wouldn't behave that way, but the Standard doesn't actually require it, and some people seem to believe that requiring such behavior would unduly restrict optimization.

supercat
  • 77,689
  • 9
  • 166
  • 211
-3

Yes it is undefined behavior.

Partially assigned array is an array containing initialized and uninitialized memory areas. Reading the uninitialized memory areas are undefined behavior just like reading any other uninitialized memory areas.

Lie Ryan
  • 62,238
  • 13
  • 100
  • 144