4
int arr[2];
int b = arr[2];

According to [expr.sub#1]

The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.

Where (E1)+(E2) is determined by [expr.add#4.2]

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

  • [...]
  • Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) array element i - j of x if 0 ≤ i - j ≤ n
  • Otherwise, the behavior is undefined.

In this example, arr + 2 points to the hypothetical array element and there is no UB here. The result of *(arr+2) is a glvalue that refers to the object pointed to by arr + 2. In the current standard, there is no normative rule that explicitly says indirection of the pointer would result in UB, as well as [basic.life]. Hence, I think there are two interpretations for this example. One is that there is an actual object of type int created in that storage, in this case, int b = arr[2]; is well-defined. The other interpretation is that there's is no object or an object of another type in that storage, according to [basic.lval#11]

If a program attempts to access the stored value of an object through a glvalue whose type is not similar to one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
  • [...]

In this case, the example is judged to be UB. So, I wonder that Is this example definitely UB or the same as what I analyzed(namely, an unspecified case)?

xmh0511
  • 7,010
  • 1
  • 9
  • 36
  • 1
    "that explicitly says indirection of the pointer would result in UB" - it isn't the case that every program exhibiting undefined behavior must explicitly be given undefined behavior. Instead, every program whose behavior is not explicitly defined exhibits undefined behavior. In this case, you have UB because the rules for array subscript apply specifically to indices which are in bounds for the array. – user2407038 Jul 27 '21 at 03:03
  • @user2407038 [expr.sub] does not specify the limit of the value of array subscript. – xmh0511 Jul 27 '21 at 03:09
  • 1
    You're asking about `*(arr + 2)` but have quoted a part of the standard which gives the behaviour of `arr + 2`. Furthermore it does limit the value of the subscript ("0 ≤ i + j ≤ n") but in this case that condition is satisfied anyways. the condition is such because it wants to include the one-past-the-end pointer, but don't mistake expr.sub for any rule which says there is an object in existence at `arr + 2`. – user2407038 Jul 27 '21 at 03:23
  • The reason your program has UB is roughly: it requires that an object exists at `arr + 2`. There is no rule in the standard which says an object exists there. In particular, decl.array says "An object of type “array of N U” consists of a contiguously allocated non-empty set of Nsubobjects of type U, known as the elements of the array, and numbered 0 to N-1". – user2407038 Jul 27 '21 at 03:25
  • 1
    From the language lawyer perspective, this really is a bit of a tricky question, because you have to make a considerable effort to establish the fact that `arr + 2` does not refer to any of the afformentioned array subobjects (and there's plenty of notes which make this fact abundantly clear, but, based on your question, it seems you're looking for reasoning basd on normative rules only) – user2407038 Jul 27 '21 at 03:28
  • @user2407038 So, the question is back to that, Is this example **definitely** UB or unspecified? In my mind, It's determined by the actual object in that storage. – xmh0511 Jul 27 '21 at 03:34
  • The intent of [CWG 232](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232) is that explicitly gives the example UB. This also implies that the example is under-specified in the current standard whether it's UB or not. – xmh0511 Jul 27 '21 at 03:45
  • I think the language in that proposal would make this case unambiguous: "If the pointer is a null pointer value (7.3.12 [conv.ptr]) or points one past the last element of an array object (7.6.6 [expr.add]), the result is an empty lvalue and does not refer to any object or function." If there is no object there, then reading from it is UB. However it seems this wording is not in the present version of the standard. I also cannot find any other rules which would explicitly have this effect. – user2407038 Jul 27 '21 at 04:00
  • @user2407038 Yes, it's the status quo of the current standard, hence this question is figured out to confirm this status. if there is no other normative rule in the standard that can judge this example, it appears to me that it's unspecified. – xmh0511 Jul 27 '21 at 04:32
  • This reads as explicit UB to me. You are definitely aliasing something with an out-of-bounds read, so the rule 3.10.10 on de-referencing a `glvalue` should apply here. It's not necessary to provide any definition for *what* is aliased by accident, you have to prove that you are not violating any aliasing rules. – Ext3h Jul 27 '21 at 05:19
  • @Ext3h I cannot prove, since I don't know whether access to the object inhabits at that storage will violate some rules and cause UB or not. Instead, we also cannot prove that access that object **definitely** caused UB. In my mind, at most we can say it's unspecified. – xmh0511 Jul 27 '21 at 06:41
  • @Ext3h Which could cause UB or could be well-defined, determined by the object inhabits at that storage to which we use the glvalue to access. In conclusion, it sounds a bit like "Schrodinger's Cat", since we didn't give a language rule that is regardless of what the actual object is, as long as reading the value of such glvalue will cause UB. – xmh0511 Jul 27 '21 at 06:56
  • @xmh0511 Spot-on, it's context-dependent *and* implementation-dependent whether you are running into explicit UB. If you have either placed an object of legal type in the correct location in memory, or the compiler has done so (and you know for certain), then it's perfectly legal. Otherwise it's explicitly undefined behavior. – Ext3h Jul 27 '21 at 11:58
  • @xmh0511 Also mind the *C* `restrict` qualifier, which enforces the rules you were expecting here, and thereby also enabling the compiler optimizations which this "might legally alias by overflow or arithmetic in case of known layout" loophole has prevented in the first place. I'm sure it is UB if you can't control the memory layout, so the compiler may assume that you couldn't deliberately alias by overflow alone. Or to phrase it differently: If the compiler could choose a layout, such that your code would access outside any allocated memory, and therefor becomes malformed. – Ext3h Jul 27 '21 at 16:43
  • @Ext3h An answer mentioned by`@bogdan` is reasonable for this question, see the [comment](https://stackoverflow.com/questions/68538300/does-that-access-to-a-hypothetical-array-element-definitely-ub?noredirect=1#comment121145877_68542409), it means this trick of using overflow to manipulate some objects is UB in c++. – xmh0511 Jul 28 '21 at 01:35

1 Answers1

1

In this example, arr + 2 points to the hypothetical array element and there is no UB here. The result of *(arr+2) is a glvalue that refers to the object pointed to by arr + 2.

You're skipping a step here. The hypothetical array element is hypothetical. That means it does not actually exist. But in the next sentence, you're talking about "the object pointed to by arr + 2". There is no object at that address.

The bit you quote from [basic.lval#11] is about aliasing - where an object does exist. It does not apply to hypothetical objects.

To clarify a misunderstanding from one of your comments, it should be noted that in C++ it is possible for memory to exist without an object existing at that address. The whole point of placement new is to create an object in memory where one didn't exist before. So you can't assume that there must be some sort of object at arr+2.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • In other words, if we didn't explicitly create an object at the storage(include some operation that implicitly creates such an object), we cannot assume an object exists at that storage, even if there is an object actually exists at that storage on the machine, Right? – xmh0511 Jul 27 '21 at 09:46
  • @xmh0511: No, that does not make sense. You're mixing the C++ abstract model (in which you have objects) and the underlying physical reality of the machine (where you have CPU's and physical RAM). Now the two are reasonably aligned, but compilers definitely look at your C++ code from the C++ side of things. `*(arr+2)` is a C++ expression. A C++ compiler may reasonably decide that that value is in a register since it doesn't refer to a C++ object in memory. Alternatively, it can decide the whole bit of code must be unreachable and just discard it in code generation. – MSalters Jul 27 '21 at 09:55
  • Consider this example, `int arr[2][2]; int(*arr_ptr)[2] = arr; int b = (*arr_ptr)[2]`, Is that `(*arr_ptr)[2]` UB? `*arr_ptr` refers to an array object of type `array of 2 int`, In this example the element is a hypothetical element of the array `(*arr_ptr)` refers to. – xmh0511 Jul 27 '21 at 09:57
  • @xmh0511: That is a different question. – MSalters Jul 27 '21 at 10:13
  • Nope, they're essentially the same question. Since you said "The hypothetical array element is hypothetical. That means it does not actually exist." In the above example, `(*arr_ptr)[2]` an arguably say it refers to the **hypothetical** element of `*arr_ptr`, however, the address of the hypothetical element does exist an object that is `arr[1][0]`. – xmh0511 Jul 27 '21 at 12:57
  • 1
    @xmh0511 `(*arr_ptr)[2]` is UB in your example. `arr[1][0]` does exist at that address, but `(*arr_ptr) + 2` does not *point to* it. *Point to* is a special standard term, defined in [\[basic.compound\]/3](https://timsong-cpp.github.io/cppwp/n4861/basic.compound#3). See also, for example, [this answer](https://stackoverflow.com/a/68317729/4326278) and associated comments. – bogdan Jul 27 '21 at 16:56
  • @bogdan The answer to that question is rational to me. the value of a pointer is categorized into four types. `*` operator is only defined for the pointers that point to objects or functions. I still hope that [CWG 232](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232) could be adopted due to it's more clear in these things and relax the conditions that could cause UB. – xmh0511 Jul 28 '21 at 01:28