As I understand it, the C++ standard guarantees that multidimensional arrays are implemented as contiguous data structures in row-major order. For example, I believe the two arrays below are guaranteed to have the same representation in memory:
int a[2][3] = {{1, 2, 3}, {4, 5, 6}};
int b[6] = {1, 2, 3, 4, 5, 6};
This means that a[i][j]
is functionally equivalent to *((int*)a + 3*i + j)
.
I'm trying to use this to index multidimensional arrays more efficiently. Below is an example that checks if a 7x7 array of integers contains only zeroes. I wrote three versions:
int Empty1(const int (&a)[7][7]) {
for (int i = 0; i < 7; ++i) {
for (int j = 0; j < 7; ++j) {
if (a[i][j] != 0) return false;
}
}
return true;
}
int Empty2(const int (&a)[7][7]) {
for (int i = 0; i < 49; ++i) {
if (a[0][i] != 0) return false;
}
return true;
}
int Empty3(const int (&a)[7][7]) {
const int *p = &a[0][0];
for (int i = 0; i < 49; ++i) {
if (p[i] != 0) return false;
}
return true;
}
Empty1() is obviously correct.
I wrote Empty2() to simplify the outer loop. But to my surprise, GCC and Clang both “optimize” Empty2() to always return false: https://godbolt.org/z/x7rWcrx8M
Empty3() is the same idea as Empty2(), but this time it compiles as intended.
I'm guessing the “optimization” in the case of Empty2() happens because I'm technically indexing out of range of the first dimension of the array, so the compiler figures it can return whatever it wants.
This is a little surprising to me, because although I understand the rule that you cannot index an array outside its range, in this case I know that all referenced elements are part of the array. After all, a[i][j]
is equivalent to *((int*)a + 7*i + j)
, so why shouldn't I be able to let j
range from 0 to 49 when i == 0
?
I have two questions:
- What's the relevant part of the standard that defines this as undefined behavior?
- Is the final version (
Empty3()
) guaranteed to work as I intended (or did I just get lucky?)
And a bonus question: from the above I would assume that the following is also undefined behavior:
int LastElement(const int (&a)[7][7]) {
return a[7][-1];
}
but this seems to compile as intended (i.e., it returns the element at a[6][6]). What gives?
edit: So based on NathanOliver's link, I think the conclusion is that Empty3() is still wrong, but the following variant should be allowed:
int Empty4(const int (&a)[7][7]) {
auto b = reinterpret_cast<const int (&)[49]>(a);
for (int i = 0; i < 49; ++i) {
if (b[i] != 0) return false;
}
return true;
}