C++: Semantics of [x] for dereferencing: C-style vs pointer array

Question

I know that a c style array is stored as a contiguous block of memory. That is why the following code:

int main (int argc, char *argv[]) {
    int arr[3][3];
    *(*arr + 5) = 5;
    std::cout << arr[1][2] << std::endl;
    return 0;
}

prints 5. I assume that for c style arrays *(*arr + 5) = 5; is roughly equals to the code which the compiler produces for arr[1][2] = 5; isn't it? (Q1)

If so, then the semantics of arr[1][2] (i.e. shifting on one block of memory) is totally different from doing the same on a multidimensional pointer array where every level of nesting results in a pointer being dereferenced. Is that right? (Q2)

Is there any situation where I need to pay attention to that myself? I.e. where the compiler does not know himself what kind of array he is dealing with? (Q3)

(Qx marks my questions)

Thank you in advance and Regards

Don't missunderstood me, your questions are interesting, but please avoid multiquestion posts and do one question per post. — Manu343726, Dec 22 '13 at 12:28
I'm pretty sure it's technically undefined behaviour to do that, even if no compiler today will do anything bad. — chris, Dec 22 '13 at 12:29
For question 2, I think this could be interesting for you: http://stackoverflow.com/questions/2895433/casting-char-to-char-causes-segfault — Manu343726, Dec 22 '13 at 12:30
In general, see http://stackoverflow.com/questions/4810664/how-do-i-use-arrays-in-c/ — Matteo Italia, Dec 22 '13 at 12:47

Joseph Mansfield · Accepted Answer · 2013-12-22T12:54:04.063

On one hand, you're talking about a two dimensional array, which you can imagine looks something like this in memory:

 0,0 0,1 0,2 1,0 1,1 1,2 2,0 2,1 2,2
┌───┬───┬───┬───┬───┬───┬───┬───┬───┐
│int│int│int│int│int│int│int│int│int│
└───┴───┴───┴───┴───┴───┴───┴───┴───┘

On the other hand, when you have an array of pointers to arrays, it looks like this:

┌────┬────┬────┐
│int*│int*│int*┿━━━━━━━━━━━━━━┓
└─╂──┴─╂──┴────┘              ┃
  ┃    ┗━━━━━━━━┓             ┃
  ▼             ▼             ▼
┌───┬───┬───┐ ┌───┬───┬───┐ ┌───┬───┬───┐
│int│int│int│ │int│int│int│ │int│int│int│
└───┴───┴───┘ └───┴───┴───┘ └───┴───┴───┘ 
 0,0 0,1 0,2   1,0 1,1 1,2   2,0 2,1 2,2

Both of these can be indexed using the same [i][j] syntax. The [] operator is defined as x[i] being equivalent to *((x) + (i)). If we index x[1][1], we have *((*((x) + (1)) + (1)).

In the first case above, first the array name x undergoes array-to-pointer conversion to get a pointer to its first element (which is itself an array, so we have a int (*)[3]), then we add 1 to it to move along to the next subarray and dereference it. This subarray then also undergoes array-to-pointer conversion to get a pointer to its first element, which we add 1 to again and dereference. So what we end up with the end is the 2nd element in the 2nd subarray.

In the second case, we are dealing with an array of pointers. First the array name x undergoes array-to-pointer conversion to get a pointer to its first element (which is a int*). Then we add 1 to it to move to the next pointer in the array and it is dereferenced to get the int* itself. Then we add 1 to this int* to move along to the next int after the one it is currently pointing at and we dereference that. That again gives us the 2nd element in the 2nd array of ints.

So, given all that:

Yes, since the elements of the 2D array are contiguous, you can do pointer arithmetic where you treat it as a 1D array with 9 ints in it.
Yes, the memory layout of the data in each case is different, so the operations that occur when indexing into them are different.
The compiler always knows which type it is dealing with. The compiler won't let you, for example, attempt to convert a 2D array to a int**. They are simply incompatible. However, you generally need to make sure you know what the memory layout of your data is.

Sometimes you have the following kind of layout where you have an int**, particularly common when you dynamically allocate an array of pointers that point to dynamically allocated arrays:

┌─────┐
│int**│
└─╂───┘
  ▼
┌────┬────┬────┐
│int*│int*│int*┿━━━━━━━━━━━━━━┓
└─╂──┴─╂──┴────┘              ┃
  ┃    ┗━━━━━━━━┓             ┃
  ▼             ▼             ▼
┌───┬───┬───┐ ┌───┬───┬───┐ ┌───┬───┬───┐
│int│int│int│ │int│int│int│ │int│int│int│
└───┴───┴───┘ └───┴───┴───┘ └───┴───┴───┘ 
 0,0 0,1 0,2   1,0 1,1 1,2   2,0 2,1 2,2

The process of indexing this layout is almost exactly the same as the second case above. The only difference is that the first array-to-pointer conversion is not necessary because we already have a pointer.

+1 Oh, I have nothing to do with my answer if you draw this awesome ascii diagrams.. :) — Manu343726, Dec 22 '13 at 12:48
Thanks, I always find diagrams useful for understanding the C++ object model. I tend to have diagrams forming in my mind when I'm coding. — Joseph Mansfield, Dec 22 '13 at 12:50
Me too. I have always this Stroupstrup's image in my mind :) http://bulldozer00.files.wordpress.com/2012/02/mem-layout.png?w=595 — Manu343726, Dec 22 '13 at 12:52
Thank you very much for your extensive answer. Obviously for c style arrays the compiler needs to know the size of every but the first dimension. Is that also a reason for why c style arrays don't exist on heap? (They don't if I'm not totally off the road) — ben, Dec 22 '13 at 13:02
@sftrabbit Hi!May i ask how to type these ascii characters? some of them are not shown on keyboard — richard.g, Dec 22 '13 at 13:07
@richard.g I always just copy them from [Wikipedia](http://en.wikipedia.org/wiki/Box-drawing_character) when I need them. — Joseph Mansfield, Dec 22 '13 at 13:20
@sftrabbit that's cruel ┘┬└. I had thought that you might have used alt codes to type them. — richard.g, Dec 22 '13 at 13:40
@richard.g Yeah, I could probably do with learning how to type them, but there's so many. In fact, it'd be nice if there was a tool for drawing these diagrams. — Joseph Mansfield, Dec 22 '13 at 13:41

score 0 · Answer 2 · answered Dec 22 '13 at 12:32

Q1: This is called pointer arithmetic. An array of integers is technically just an int* in practical use, and adding 1 to a pointer makes the pointer point to the next item in the list. Address-wise it's equivalent to raising the in-memory address with sizeof(int).

Q2: While syntactically equivalent, the compiler treats it differently because it knows to treat differently because of the declaration. If arr is an int** the internals are compiled as a multidimensional pointer array, if it's an int[x][y] it just does the maths for you inside and keeps it flat, hence why the pointer arithmetic still works.

Q3: No, if the compiler wouldn't know what it was dealing with it would be a fundamental flaw in the language. This is actually pretty well defined.

score 0 · Answer 3 · answered Dec 22 '13 at 12:36

A1: Static multidimension arrays (2D in this case) can be accessed using either 2D or 1D indexes. The elements are one after the other w/o gaps.

A2: the compiler knows the relevant offsets at compile time so a[x][y] is actually a single pointer dereference.

A3: The compiler always know the type and will produce code accordingly.

score 0 · Answer 4 · edited May 23 '17 at 10:25

Answer 1:

You are correct. A multiimensional array is only a contiguous block of arrays, where that arrays are contiguous blocks of elements too. So when you create an array like this:

    int array[n1][n2]...[nm];

What the compiler does is to allocate a n1 * n2 * ... * nm length unidimensional contiguous array. Thats why your *(*array + 5) works.
In fact when you index elements of a multidimensional array: array[index_1][index_2]...[index_m] = 0; What the compiler really does is something like this:

    array[index_1 * ( n1 * n2 * ... * n(m-1) ) + index_2 * ( n1 * n2 * ... * n(m-2) ) + .. + index_m] = 0;

Answer 2:

Multidimensional arrays are not the same as pointers to pointers to pointers to pointers... Multidimensional arrays are, as I said before, continious regions of aligned memory, and dynamic arrays of pointers to dynamic arrays of pointers to... are arrays which elements are pointers. Thats is, things that are dessigned to be dereferenced. Thats why treating a **array as an array[][] may fail, and biceversa. For more info about this topic, see this thread: casting char[][] to char** causes segfault?

SliceSort · Answer 5 · 2013-12-22T12:56:01.087

-1

if you define

int arr[A][B];

then

arr[x][y]

is actually

*(arr+x*B+y) //array name can be treated as memory location or const pointer

so, double dereference in your code is wrong. <== This line is wrong. Sorry.

edited Dec 22 '13 at 12:56

answered Dec 22 '13 at 12:37

SliceSort

357
3
5

Hm as I stated the code that I posted runs on my machine. Also I actually think your code does not run as expected. AFAIK in *(arr+A*x+y) the part: A*x+y will result in some number which is then multiplied by: sizeof(int) * A. But I would be glad if somebody could conform that as I'm not an expert here. – ben Dec 22 '13 at 12:40
1

Double dereference is right, the point is that `*arr` is of type `int (*)[3]`. – Matteo Italia Dec 22 '13 at 12:42
@user2820379 sorry for typping mistake. Corrected. – SliceSort Dec 22 '13 at 12:43
@Matteo Italia In defenition `int arr[A][B];`, arr can only be dereference once. – SliceSort Dec 22 '13 at 12:50
1

@SliceSort: nope. You dereference once and you get an `int (*)[3]`, you dereference again and you get an `int`. If you could dereference only once expressions like `a[x][y]`, which is (by *definition* of the subscript operator) `*(*(a+x)+y)`, couldn't work. – Matteo Italia Dec 22 '13 at 12:55
@Matteo Italia Sorry for my wrong answer and thanks for your explanation! – SliceSort Dec 22 '13 at 12:57
1

Look up the [array FAQ](http://stackoverflow.com/questions/4810664/how-do-i-use-arrays-in-c/) and see by yourself. – Matteo Italia Dec 22 '13 at 12:58

C++: Semantics of [x] for dereferencing: C-style vs pointer array

5 Answers5

Answer 1:

Answer 2: