c/c++ -- Is writing to multi-dimesional array from 0 offset UB?

Question

Kindly examine the code below:

#include "stdio.h"

#define N 2
#define M 2

int main(void)
{
    int two_d[N][M];
    for(size_t i = 0; i < N*M; ++i) {
        two_d[0][i] = i;  // <---- Pay attention to this line!
    }
    for(size_t i = 0; i < N; ++i) {
        for(size_t j = 0; j < M; ++j) {
            printf("%d\n", two_d[i][j]);
        }
    }
    return 0;
}

Please don't be skeptical about this example and be fast in judging it contrived -- the one was found by yours truly in the very much real and quite well know project (that is very famous to be named).

I would appreciate a phone number of a good language lawyer!

On the one hand, the memory is guaranteed to be laid out sequentially, so I'm not accessing anything beyond the object in general;
On the other hand, I'm clearly accessing the memory beyond first 1d array -- and doing it is UB.

Example compiles and runs fine on my machine. Mr. Godbolt shows that both C and C++ compilers do the same thing, and with optimizations both handle it like a doctor.

So, the questions are:

Is this legal in C?
Is this legal in C++?

Standards quotes would be appreciated.

There's a strong convention for organizing the contents of 2D arrays in a particular way, but I'm not sure it's enforced by the standard. You're doing out of bounds access, which is not allowed. It might *work* though. — tadman, Jul 22 '23 at 19:36
Do not tag both C and C++ except when asking about differences or interactions between the two languages, especially for language-lawyer questions. Pick one tag and delete the other. If you need an answer for both questions, you could ask separate questions and relate them with links. However, this is a duplicate for C and is likely a duplicate for C++. — Eric Postpischil, Jul 22 '23 at 19:52
@tadman, the most dangerous outcome of UB is always that the code "works." — Chris, Jul 22 '23 at 19:58
@Chris Oh, don't we all know it. Works great, until it doesn't. — tadman, Jul 22 '23 at 19:59
@tadman starring "THAT WORKED ON MY MACHINE!" and "EVERYTHING WAS FINE YESTERDAY!" — tntnkn, Jul 22 '23 at 20:02
@EricPostpischil do I need to change it in this case? It is somewhat about the difference between the languages, as you mentioned. — tntnkn, Jul 22 '23 at 20:03
@tntnkn: This question asks about the rules in C and about the rules in C++, not so much the difference between them. It is a problem because, when you ask about both, somebody might answer for C. And then, in the future, when somebody is searching for an answer about C++, they would find that answer because of the C tag, and their time would have been wasted. Spamming tags results in poor quality search results later. As long as this questions remains closed as a duplicate, it is not as much of a problem, but it should be avoided. — Eric Postpischil, Jul 22 '23 at 20:05
@EricPostpischil ok, so I will leave the `C++` part as it was answered in a more complete form. Could I refer to @0___________ answer in `P.S.` to mention the rules for `C`? — tntnkn, Jul 22 '23 at 20:10
To access linearly, you want use `*(two_d + i)`. In your code, you are going outside the boundaries for the `M` dimension. — Thomas Matthews, Jul 22 '23 at 22:01

Nathan Pierson · Accepted Answer · 2023-07-22T20:21:19.817

In C++, the meaning of the subscript expression is given in expr.sub:

With the built-in subscript operator, an expression-list shall be present, consisting of a single assignment-expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type. The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.

Following up about the rules for + in expr.add:

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value. (4.2)

Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 <= i + j <= n and the expression P - J points to the (possibly-hypothetical) array element i - j of x if 0 <= i - j <= n

Otherwise, the behavior is undefined

Your code snippet invokes undefined behavior.

In C, the rules are very similar. From 6.5.2.1/2, array subscripting:

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

Then, from 6.5.6/8, additive operators:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Just like in C++, it's undefined behavior to go outside the bounds of an array, with no special exemption for "but what if there's another array right next to it".

I should probably try a little better to not forget that there is actually the array type... Thank you for your answer. I would accept your answer as the exact quotations were provided. Could you please refer in your answer to the one of @0___________ to make the 'C' part complete? — tntnkn, Jul 22 '23 at 20:06
@tntnkn I went ahead and included the relevant C standard quotes directly in this answer. — Nathan Pierson, Jul 22 '23 at 20:34

score 1 · Answer 2 · edited Jul 22 '23 at 19:59

1

Is this legal in C?

Is this legal in C++?

No, both are UBs

int array[x][y] is an array of x arrays having y int elements. If the second subscript is >= y then you access y element int array outside its bounds.

In 'C' you can prevent UB by using union.


#define N 2
#define M 2

int main(void)
{
    union
    {
        int two_d[N][M];
        int one_d[N*M];
    }u;

    for(size_t i = 0; i < N*M; ++i) {
        u.one_d[i] = i; 
    }
    for(size_t i = 0; i < N; ++i) {
        for(size_t j = 0; j < M; ++j) {
            printf("%d\n", u.two_d[i][j]);
        }
    }
    return 0;
}

edited Jul 22 '23 at 19:59

Chris

26,361
5
21
42

answered Jul 22 '23 at 19:40

0___________

60,014
4
34
74

1

Thank you for your answer and for the great example of union! Could you though specify the exact rule applicable? – tntnkn Jul 22 '23 at 19:46
1

6.5.2.1 https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf – 0___________ Jul 22 '23 at 19:47
Also, probably an important observation would be that this array does not decay to `int **` when passed to a function, but rather to `int(*)[2]`, which is a though that hadn't entered my mind when I was just writing the question. – tntnkn Jul 22 '23 at 19:49
1

@tntnkn no, the reason is different - `**` is a pointer to pointer so you need the underlying pointer (physical pointer object). – 0___________ Jul 22 '23 at 19:51
1

@0___________ Nothing in 6.5.2.1 in that link directly mentions bounds on valid values of `E2`. Think you need the additional cite to 6.5.6 – Nathan Pierson Jul 22 '23 at 19:52
@NathanPierson there is a human-friendly explanation in the example. OP should do his own research – 0___________ Jul 22 '23 at 19:53
@0___________ yeah, my bad, `**` was a mistake, thank you, – tntnkn Jul 22 '23 at 19:57
3

[“OP should do his own research”](https://stackoverflow.com/questions/76745467/c-c-is-writing-to-multi-dimesional-array-from-0-offset-ub#comment135300738_76745505) [“Any proof? You need to have some evidence supporting your claim”](https://stackoverflow.com/questions/76744675/if-i-bound-the-value-of-a-variable-with-an-operation-like-min-am-i-safe-to-ma#comment135299227_76744707) – Eric Postpischil Jul 22 '23 at 19:59

c/c++ -- Is writing to multi-dimesional array from 0 offset UB?

2 Answers2