I'm trying to understand what exactly is undefined in accessing
array[6]
Per C 2018 3.4.3, “undefined behavior” means the C standard imposes no requirements. So, undefined behavior is not a specific thing; it means there is a complete lack of the C standard saying what should or should not happen.
array[6]
is undefined because arithmetic outside of array bounds is not defined. array[6]
is specified to be equivalent to *(array+6)
, and addition to a pointer is specified in C 2018 6.5.6 8. It says that additions that result in pointing to an element in the array or one beyond the last element are defined, but “otherwise, the behavior is undefined.” So array[6]
has undefined behavior because the C standard explicitly says so.
If we consider that other case, pointing one beyond the last array element, then array+5
is defined. However *(array+5)
is not defined, because C 2018 6.5.6 8 tells us that, while array+5
points to the location just beyond the last element of the array, the *
operator shall not be applied to it:
If the result points one past the last element of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
In this context (a paragraph of the standard that is not one of the “Constraints” paragraphs), violating a “shall not” rule means the behavior is undefined, because C 2018 4 2 says:
If a "shall" or "shall not" requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined…
So, exactly what causes this to be undefined is that it violates an explicit rule of the standard, and violating that rule means the behavior is undefined.
for my understanding, the compiler will try to read the value inside * (array+6)
so it is defined,…
No, *(array+i)
refers to element i
of the array array
when the behavior is defined. When the behavior is not defined, it does not have any meaning.
… so if I understand right, what is undefined is what happens when you try to read uninitialized memory.
The rules around this are complicated, and they are not actually relevant here. Since *(array+6)
is undefined, it does not mean to read memory. It has no meaning as far as the C standard is concerned.
When the compiler is compiling code and encounters *(array+6)
used as a value, it will ordinarily generate instructions to get the value of element 6 of the array. By “ordinarily,” I mean when the behavior is defined. However, modern compiler are not simplistic code generators. They perform semantic analysis and optimization of the program. Instead of reading code and generating instructions, they read code and generate a semantic representation of the program. That semantic representation and the rules of the C standard are operated on by optimizing software. Once the optimizing software is done, then instructions are generated. In this process, the semantic analyzer may determine that *(array+6)
has no meaning, and then the optimizer will strip it away. It will not generate code to load from the memory that array+6
would reference if things were different.
This is useful when there is code such as:
if (some test)
{
x = array[6];
various code;
}
else
{
x = array[3];
various code;
}
When the compiler can deduce in a certain situation that array[6]
is not defined, then it can conclude that the only defined path through this code is if some test
is false. It can then generate the code for the “else” case and remove the code for the “then” case.
Logically, this is equivalent to “Since the behavior of the code in the ‘then’ case is not defined, it can be anything. If we choose to make it the same as the code in the ‘else’ case, then we can use the same instructions for both cases, so we do not need to branch based on some test
and will have smaller code. This conforms to the C standard, because it does not define what the ‘then’ case does.”