0

As I understand it, the C++ standard guarantees that multidimensional arrays are implemented as contiguous data structures in row-major order. For example, I believe the two arrays below are guaranteed to have the same representation in memory:

int a[2][3] = {{1, 2, 3}, {4, 5, 6}};
int b[6] = {1, 2, 3, 4, 5, 6};

This means that a[i][j] is functionally equivalent to *((int*)a + 3*i + j).

I'm trying to use this to index multidimensional arrays more efficiently. Below is an example that checks if a 7x7 array of integers contains only zeroes. I wrote three versions:

int Empty1(const int (&a)[7][7]) {
    for (int i = 0; i < 7; ++i) {
        for (int j = 0; j < 7; ++j) {
            if (a[i][j] != 0) return false;
        }
    }
    return true;
}

int Empty2(const int (&a)[7][7]) {
    for (int i = 0; i < 49; ++i) {
        if (a[0][i] != 0) return false;
    }
    return true;
}

int Empty3(const int (&a)[7][7]) {
    const int *p = &a[0][0];
    for (int i = 0; i < 49; ++i) {
        if (p[i] != 0) return false;
    }
    return true;
}

Empty1() is obviously correct.

I wrote Empty2() to simplify the outer loop. But to my surprise, GCC and Clang both “optimize” Empty2() to always return false: https://godbolt.org/z/x7rWcrx8M

Empty3() is the same idea as Empty2(), but this time it compiles as intended.

I'm guessing the “optimization” in the case of Empty2() happens because I'm technically indexing out of range of the first dimension of the array, so the compiler figures it can return whatever it wants.

This is a little surprising to me, because although I understand the rule that you cannot index an array outside its range, in this case I know that all referenced elements are part of the array. After all, a[i][j] is equivalent to *((int*)a + 7*i + j), so why shouldn't I be able to let j range from 0 to 49 when i == 0?

I have two questions:

  1. What's the relevant part of the standard that defines this as undefined behavior?
  2. Is the final version (Empty3()) guaranteed to work as I intended (or did I just get lucky?)

And a bonus question: from the above I would assume that the following is also undefined behavior:

int LastElement(const int (&a)[7][7]) {
    return a[7][-1];
}

but this seems to compile as intended (i.e., it returns the element at a[6][6]). What gives?

edit: So based on NathanOliver's link, I think the conclusion is that Empty3() is still wrong, but the following variant should be allowed:

int Empty4(const int (&a)[7][7]) {
    auto b = reinterpret_cast<const int (&)[49]>(a);
    for (int i = 0; i < 49; ++i) {
        if (b[i] != 0) return false;
    }
    return true;
}
Maks Verver
  • 661
  • 4
  • 14
  • 2
    https://stackoverflow.com/questions/50795498/treating-2d-array-as-1d-array – NathanOliver Aug 18 '22 at 16:34
  • As an answer to the bonus question: [link](https://en.cppreference.com/w/cpp/language/ub) . "UB and optimization" subheading. – yokus Aug 18 '22 at 16:38
  • 2
    A side note: I always prefer to manage my multi dimensional arrays "manually": allocate a continous block of memory for the whole array and use stride(s) to access specific elements (e.g. for 2D: `element_idx = y * stride + x`). It's also the approach used by opencv and numpy as far as I know so I believe it is quite efficient. – wohlstad Aug 18 '22 at 16:52
  • For your first question: https://stackoverflow.com/questions/68316903/out-of-the-bounds-in-c-and-undefined-behaviour – Özgür Murat Sağdıçoğlu Aug 18 '22 at 17:42

1 Answers1

1

Is indexing multidimensional arrays out of range undefined behavior?

Yes, because the expression a[i][j] is equivalent to (a[i])[j] due to operator precedence.

Now since going out of bounds of a 1D array is undefined behavior, this implies that if any of the index i or j in the above expression (a[i])[j]==a[i][j] goes beyond range xrange - 1 = 2 -1 =1 and yrange - 1 = 3 - 1 = 2(for your given example) respectively, we will have undefined behavior.

Note xrange and yrange are shorthand for the initial size that you specified as rows and columns of the array which correspond to 2 and 3 in your example.

Jason
  • 36,170
  • 5
  • 26
  • 60