7

There are tons of code like this one:

#include <stdio.h>

int main(void)
{
    int a[2][2] = {{0, 1}, {2, -1}};
    int *p = &a[0][0];

    while (*p != -1) {
        printf("%d\n", *p);
        p++;
    }
    return 0;
}

But based on this answer, the behavior is undefined.

N1570. 6.5.6 p8:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Can someone explain this in detail?

Community
  • 1
  • 1
David Ranieri
  • 39,972
  • 7
  • 52
  • 94
  • I assume that is because the type of `a[0]` and `a[1]` is array of `int`s, not an `int`. – Dabo Aug 14 '14 at 08:48
  • 2
    did you read christoph's answer in your linked question? http://stackoverflow.com/a/7787436/3684343 I think he explained it very well. – mch Aug 14 '14 at 08:53
  • 3
    Yes, loads of code with UB out there. – n. m. could be an AI Aug 14 '14 at 08:54
  • 1
    Ranks side-by-side with walking a pointer *backward* through an array until its value is "less than" the first element address. Its *amazing* how frequently it is done, and how almost no one that does it has any idea they're invoking UB in doing so. – WhozCraig Aug 14 '14 at 08:58
  • 2
    I cannot find anything in N1570 to justify this rule, other than "because standard says so". It seems that array subscription and `sizeof` rules prevent having any kind of padding between different array dimensions. I wonder if there is any standard compilant system in which above code would break. – user694733 Aug 14 '14 at 09:10
  • @user694733 Remember strict aliasing int (*)[] != int * – Vality Aug 14 '14 at 09:19
  • @Vality I don't think strict aliasing explains *why* this rule exists. Compiler has enough information to sort this out. In the end both the array subsciption and direct pointer access result in the same thing: pointer arithmetric and dereferencing `int *`. – user694733 Aug 14 '14 at 09:38
  • @user694733 In the example case, yes the compiler can sort this out, but say the array gets passed to another function, this could then become more complex for the compiler, particularly if the function sometimes is passed real arrays and sometimes pointers. However I may read into this to find out more. – Vality Aug 14 '14 at 09:47
  • 1
    Now that I think about this, maybe the purpose of this restriction is to allow placement of subarrays on different memory banks, kind of what PICs have. So a[0] and a[1] might be placed on different banks, and the example code would fail because compiler assumes that there is no need for bank switching instructions in the loop. – user694733 Aug 14 '14 at 10:35
  • @JonathanLeffler this is the issue we were debating in comments the other day – M.M Feb 24 '15 at 09:14

3 Answers3

9

The array who's base address (pointer to first element) p is assigned is of type int[2]. This means the address in p can legally be dereferenced only at locations *p and *(p+1), or if you prefer subscript notation, p[0] and p[1]. Furthermore, p+2 is guaranteed to be a legally evaluated as an address, and comparable to other addresses in that sequence, but can not be dereferenced. This is the one-past address.

The code you posted violates the one-past rule by dereferencing p once it passes the last element in the array in which it is homed. That the array in which it is homed is buttressed up against another array of similar dimension is not relevant to the formal definition cited.

That said, in practice it works, but as is often said. observed behavior is not, and should never be considered, defined behavior. Just because it works doesn't make it right.

WhozCraig
  • 65,258
  • 11
  • 75
  • 141
  • Thak you whoz, I understand that I can check `if (p + 3) {` but I can't dereference `int x = *(p + 3);`, right? – David Ranieri Aug 14 '14 at 09:39
  • 2
    No, even the address *value* `p+3` is not legal to use for eval, comparison, or dereference.. It is outside the range of addresses in `a[0]` ... `a[0]+2` (the latter being the one-past address of the `int[2]` array at `a[0]`). – WhozCraig Aug 14 '14 at 09:51
  • @WhozCraig I'm coming from answering http://stackoverflow.com/questions/29666141/what-is-wrong-passing-a-2d-array-to-a-respective-pointer-argument and wonder who's right (there is somebody wrong on the internet! https://xkcd.com/386/). Is `a` not guaranteed to occupy contiguous memory? I could certainy legally iterate through `a` via a char*. There is no aliasing issue with an int* either, given n1570's recursive exemption in 6.5,7 regarding aggregates. So where do you find wording that accessing `p+3` is UB? Would it iyo make a diff to cast `a`'s address directly to an int*? – Peter - Reinstate Monica Apr 16 '15 at 13:46
4

The object representation of pointers is opaque, in C. There is no prohibition against pointers having bounds information encoded. That's one possibility to keep in mind.

More practically, implementations are also able to achieve certain optimizations based on assumptions which are asserted by rules like these: Aliasing.

Then there's the protection of programmers from accidents.

Consider the following code, inside a function body:

struct {
    char c;
    int i;
  } foo;

char * cp1 = (char *) &foo;
char * cp2 = &foo.c;

Given this, cp1 and cp2 will compare as equal, but their bounds are nonetheless different. cp1 can point to any byte of foo and even to "one past" foo, but cp2 can only point to "one past" foo.c, at most, if we wish to maintain defined behaviour.

In this example, there might be padding between the foo.c and foo.i members. While the first byte of that padding co-incides with "one past" the foo.c member, cp2 + 2 might point into the other padding. The implementation can notice this during translation and instead of producing a program, it can advise you that you might be doing something you didn't think you were doing.

By contrast, if you read the initializer for the cp1 pointer, it intuitively suggests that it can access any byte of the foo structure, including padding.

In summary, this can produce undefined behaviour during translation (a warning or error) or during program execution (by encoding bounds information); there's no difference, standard-wise: The behaviour is undefined.

Shao
  • 537
  • 3
  • 7
  • 1
    I think you meant "padding bytes". Yes, but there is no requirement on an implementation to distinguish between the two cases. So if it can refuse to translate and give you a warning in one example, it can do the same in the other. The idea is: Program what you mean instead of what might happen to work or ought to work. (int *) &a will be fine. – Shao Aug 14 '14 at 10:01
1

You can cast your pointer into a pointer to a pointer to array to ensure the correct array semantics.

This code is indeed not defined but provided as a C extension in every compiler in common usage today.

However the correct way of doing it would be to cast the pointer into a pointer to array as so:

((int (*)[2])p)[0][0]

to get the zeroth element or say:

((int (*)[2])p)[1][1]

to get the last.

To be strict, he reason I think this is illegal is that you are breaking strict aliasing, pointers to different types may not point to the same address (variable).

In this case you are creating a pointer to an array of ints and a pointer to an int and pointing them to the same value, this is not allowed by the standard as the only type that may alias another pointer is a char * and even this is rarely used properly.

Vality
  • 6,577
  • 3
  • 27
  • 48
  • strict aliasing says that the value representation of one type may not be read as memory of another type (except for a few permitted aliases). It's always OK to read `int` as `int`, even if one is an aggregate member and one isn't . The idea about pointers to different types pointing to overlapping storage is covered by `restrict`. – M.M Feb 24 '15 at 09:20