4

While doing some research on multi-dimensional arrays in C and how they're stored in memory I came across this: "Does C99 guarantee that arrays are contiguous?". The top-voted answer states that "It must also be possible to iterate over the whole array with a (char *)," then provides the following "valid" code:

int  a[5][5], i, *pi;
char *pc;

pc = (char *)(&a[0][0]);
for (i = 0; i < 25; i++)
{
    pi = (int *)pc;
    DoSomething(pi);
    pc += sizeof(int);
}

The poster then goes on to say that "Doing the same with an (int *) would be undefined behavior, because, as said, there is no array[25] of int involved."

That line confuses me.

Why does using a char pointer constitute as valid / defined behavior while substituting it with an integer pointer doesn't?

Sorry if the answer to my question should be obvious. :(

Community
  • 1
  • 1
kylemart
  • 1,156
  • 1
  • 13
  • 25

3 Answers3

3

The difference between using a char* and an int* is strict aliasing rules: If you access (&a[0][0])[6] (i. e. via an int*), the compiler is free to assumes that the access [6] does not leave the array at a[0]. As such, it is free to assumes that (&a[0][0]) + 6 and a[1] + 1 point to different memory locations, even though they don't, and reorder their accesses accordingly.

The char* is a difference because it is explicitly exempted from strict aliasing rules: You can cast anything to a char* and manipulate its bits through this pointer without invoking undefined behavior.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • So, using *both* `(&a[0][0]) + 6` and `a[1] + 1` to read *and* write those two memory locations in a function may cause incorrect behavior when the compiler reorders these accesses, right? – Norman Feb 01 '16 at 13:08
  • Could someone please elaborate on what it means for the compiler to "reorder their accesses"? – kylemart Feb 01 '16 at 16:46
  • 1
    @ky1emart Consider this code: `int* foo = &a[0][0]; a[1][1] = 7; foo[6] = 42; printf("%d\n", a[1][1]);` You may find that this code prints "7" instead of the expected "42" because a) the compiler has emitted code that first writes 42 and then 7 to the memory location, so that the `printf()` reads the later 7, or b) the compiler has emitted code that first writes the 7, then reads it back, and then overwrites it with 42. Both variants lead to equally unintended results, and neither one honors the sequence of events that you have written down. – cmaster - reinstate monica Feb 01 '16 at 18:54
  • @cmaster Sorry for the stream of questions, but why would the compiler perform that unintended re-order? I read a similar example on Wikipedia (https://en.m.wikipedia.org/wiki/Pointer_aliasing) but *why* this behaviour occurs is never explained. – kylemart Feb 01 '16 at 19:02
  • 1
    @ky1emart It is an important optimization for two reasons: 1. It allows the compiler to put more other instructions between a read from memory and the use of the read value. Memory reads have a really high latency, so it's important to keep the CPU busy while the data is delivered to it. 2. It allows the compiler to move reads of constant data out of loops. Consider `for(int i = 0; i < N; i++) integers[i] *= (int)*floatPtr;`. Without strict aliasing rules, `*floatPtr` would have to be read from memory in every iteration. With strict aliasing rules, it can be read once before the loop. – cmaster - reinstate monica Feb 01 '16 at 19:56
1

The standard is very clear that if you have:

int a[5];
int* p = &a[0];

Then

p += 6;

is cause for undefined behavior.

We also know that memory allocated for a 2D array such as

int a[5][5];

must be contiguous. Given that, if we use:

int* p1 = &a[0][0];
int* p2 = &a[1][0];

p1+5 is a legal expression and given the layout of a, it is equal to p2. Hence, if we use:

int* p3 = p1 + 6;

why should that not be equivalent to

int* p3 = p2 + 1;

If p2 + 1 is legal expression, why should p1 + 6 not be a legal expression?

From a purely pedantic interpretation of the standard, using p1 + 6 is cause for undefined behavior. However, it is possible that the standard does not adequately address the issue when it comes to 2D arrays.

In conclusion

From all practical points of view, there is no problem in using p1 + 6.
From a purely pedantic point of view, using p1 + 6 is undefined behavior.

R Sahu
  • 204,454
  • 14
  • 159
  • 270
0

Either an int pointer or a char pointer should work, but the operation should differ slightly in these two cases. Assuming sizeof(int) is 4. pc += sizeof(int) moves the pointer 4 bytes forward, but pi += sizeof(int) would move 4 times 4 bytes forward. If you want to use an int pointer, you should use pi ++.

EDIT: sorry about the answer above, using an int pointer does not comply with C99 (although it usually practically works). The reason is explained well in the original question: pointer goes across an array is not well defined in the standard. If you use an int pointer, you would start from a[0], which is a different array from a[1]. In this case, an a[0] int pointer cannot legally (well-defined) point to a[1] element.

SECOND EDIT: Using a char pointer is valid, because the following reason given by the original answer:

the array as a whole must be working when given to memset, memmove or memcpy with the sizeof. It must also be possible to iterate over the whole array with a (char *).

From section 6.5.6 "Additive Operators"

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

So it is reasonable.

xuhdev
  • 8,018
  • 2
  • 41
  • 69
  • 1
    That's not the issue. The issue is whether it is legal to increment an `int*` that is intialized to `a[0][0]` 25 times. The linked answer contends that it is not. – R Sahu Feb 01 '16 at 05:46
  • Thanks for the quick response + revisions, but how does a char pointer resolve the UB issue? That's the crux of my question. – kylemart Feb 01 '16 at 06:03