0

I was playing around with some arrays and pointers in c and started wondering whether doing this would be undefined behavior.

int (*arr)[5] = malloc(sizeof(int[5][5]));

// Is this undefined behavior?
int val0 = arr[0][5];

// Rephrased, is it guaranteed it'll always have the same effect as this line?
int val1 = arr[1][0];

Thank you for any insights.

Gusgo99
  • 109
  • 1
  • 5
  • *I'd also be curious if this would be undefined behavior in C* -- The `new` keyword doesn't exist in C. – PaulMcKenzie Apr 05 '21 at 00:30
  • Since the allocated memory is continuous, it technically should always give you the same result. I believe it might not be an undefined behaviour. I have to note that, however funny it might look like, you shouldn't use it, ever. – Maras Apr 05 '21 at 00:39
  • @PaulMcKenzie The `new` was only there to show the allocated array is big enough to allow for `arr[1][0]` to be accessed. But I removed it to avoid confusions about the actual question – Gusgo99 Apr 05 '21 at 00:40
  • @Maras: The fact that memory is contiguous does not mean the behavior is defined by the language standard. – Eric Postpischil Apr 05 '21 at 00:40
  • 'undefined behavior' might be, and since the array dims are unspecified, the two examples do not have the same effect. – Martin James Apr 05 '21 at 00:41
  • @Maras I do expect it to work on most platforms, I'm just wondering if it is guaranteed by the standard that will work. And no, I don't plan on using it on actual software. Just trying to find that out of curiosity. – Gusgo99 Apr 05 '21 at 00:42
  • Oh....you edited it and removed the declaration. This Q. is a mess now:(( – Martin James Apr 05 '21 at 00:44
  • 1
    You should probably add the language-lawyer tag in that case. I would also suggest picking only one language, since the answer might, or might not, be the same in both. – cigien Apr 05 '21 at 00:44
  • 1
    Does this help - https://stackoverflow.com/questions/43851470/cast-t-to-t ? – Ajay Brahmakshatriya Apr 05 '21 at 00:59
  • @cigien Thanks you for the tip. I applied the tag and changed the question to C only – Gusgo99 Apr 05 '21 at 01:01
  • The behaviour is undefined. Hence whether it works or not is undefined. @Gusgo99 the "I expect it to work on most platforms" is **exactly** what one should **never** expect. – Antti Haapala -- Слава Україні Apr 05 '21 at 01:11
  • @AnttiHaapala: Unfortunately, the Standard makes no effort to avoid characterizing as Undefined Behavior actions which 90%+ of implementations should process consistently. Instead, it goes out of its way to characterize as UB actions which some implementations might be unable to meaningfully define, even if most implementations should process them identically. [The term "Implementation-Defined" is reserved for things that *all* implementations are required to define]. Unfortunately, some compiler writers thing the Committee's priorities as being the opposite of what they actually were. – supercat Apr 06 '21 at 20:37

2 Answers2

3

In C, what you're doing is undefined behavior.

The expression arr[0] has type int [5]. So the expression arr[0][5] dereferences one element past the end of the array arr[0], and dereferencing past the end of an array is undefined behavior.

Section 6.5.2.1p2 of the C standard regarding Array Subscripting states:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).

And section 6.5.6p8 of the C standard regarding Additive Operators states:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently,N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n -th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object,the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

The part in bold specifies that the addition implicit in an array subscript may not result in a pointer more that one element past the end of an array, and that a pointer to one element past the end of an array may not be defererenced.

The fact that the array in question is itself a member of an array, meaning the elements of each subarray are continuous in memory, doesn't change this. Aggressive optimization settings in the compiler may note that it is undefined behavior to access past the end of the array and make optimizations based on this fact.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • Thank you very much. This is exactly what I was looking for – Gusgo99 Apr 05 '21 at 00:58
  • Does anything in the Standard allow for the use of `memcpy` to be used to copy a two-dimensional array in one step, or is support for such a construct left as a quality-of-implementation issue on the presumption that quality implementations will support it whether or not it's "technically" defined? – supercat Apr 06 '21 at 19:16
  • @supercat `memcpy` can be used to copy arrays and structures, so I don't see why a 2D array would be any different. It probably falls under the umbrella of 1) allowing a `char *` to access the representation of an object, and 2) `memcpy` being considered part of the implementation and therefore not strictly subject to the standard. – dbush Apr 06 '21 at 19:21
  • @dbush: Given `char x[10][10],y[10][10];`, the operation `memcpy(x[0],y[0],11)`; would attempt to copy a byte past the end of array object `x[0]`. "If an array is accessed beyond the end of an object, the behavior is undefined.". Obviously a quality implementation should support such a usage case, but when the Standard was written it would have been considered obvious that quality implementations should support many constructs that the clang and gcc optimizers make no attempt to handle reliably. – supercat Apr 06 '21 at 19:26
  • @supercat And at the same time, `memcpy(x, y, 100);` would be fine. I agree that your case "should" work, but I wouldn't put it past an aggressive compiler to do something that might be considered surprising. – dbush Apr 06 '21 at 19:30
  • 1
    @dbush: When the Standard was written, the authors saw no need to explicitly define behavior in all the situations where an implementation would have to go out of its way not to behave in an obvious and useful fashion, and forbid such deviant behavior. Clang and gcc interpret the Standard as mandating only those cases whose omission would so completely break the language as to be undeniably absurd (even though the Standard doesn't actually mandate those), and interpreting its failure to mandate support for other useful cases as an intention to imply that any code exploiting them is "broken". – supercat Apr 06 '21 at 19:42
  • @dbush: I'd be interested in your thoughts about the examples in my answer, with regard to situations where the authors of the Standard would have intended to allow implementations to assume that code won't perform out-of-bounds inner-array accesses, situations where they would have intended that implementations allow programmers to exploit such accesses, and situations in the middle that the authors of the Standard never considered. – supercat Apr 06 '21 at 20:32
1

The Standard is clearly intended to avoid requiring that a compiler given something like:

int foo[5][10];
int test(int i)
{
  foo[1][0] = 1;
  foo[0][i] = 2;
  return foo[1][0];
}

must reload the value of foo[1][0] to accommodate the possibility that the write to foo[0][i] might affect foo[1][0]. On the other hand, before the Standard was written, it would have been idiomatic to write something like:

void dump_array(int *p, int rows, int cols)
{
  int i,j;
  for (i=0; i<rows; i++)
  {
    for (j=0; j<cols; j++)
      printf("%6d", *p++);
    printf("\n");
  }
}
int foo[5][10];
...
  dump_array(foo[0], 5, 10);

and nothing in the published Rationale suggests that the authors had any intention of forbidding such constructs nor breaking code that used them. Indeed, the primary benefit of requiring that rows of an array be placed consecutively, even when adding padding would improve efficiency, is to allow such code to function.

At the time the Standard was written, when generating code for a function that received a pointer, compilers would treat the pointer as though it might identify some arbitrary part of some arbitrary larger object, without making any effort to know or care about what that enclosing object might be. They would thus, as a very popular form of "conforming language extension", support constructs like dump_array without regard for whether the Standard required them to do so, and consequently the authors of the Standard saw no reason to worry about when the Standard mandated such support. Instead, they left such matters as a Quality of Implementation issue over which the Standard could waive jurisdiction.

Unfortunately, because the authors of the Standard expected that compilers would treat the act of passing a pointer to a function as implicitly "laundering" it, the authors of the Standard saw no need to define any explicit method for laundering information about a pointer's enclosing objects in cases where it would be necessary for a function to treat a pointer identifying "raw" storage. Such distinctions didn't matter given the state of compiler technology in the 1980s, but may be quite relevant if e.g. code does something like:

int matrix[10][10];
void test2(int c)
{
  matrix[4][0] = 1;
  dump_array(matrix[0], 1, c);
  matrix[4][0] = 2;
}

or

void test3(int r)
{
  matrix[4][0] = 1;
  dump_array((int*)matrix, r, 10);
  matrix[4][0] = 2;
}

Depending upon what the functions is intending to do, having a compiler optimize out the first write to matrix[4][0] in one or both may improve efficiency, or it may cause the generated code to behave uselessly. Treating explicit pointer conversions as erasing type information, but treating array-to-pointer decay as retaining it, would allow programmers to achieve required semantics if they write code as in the second example, while allowing compilers to perform the relevant optimizations when source code is written as in the first example. Unfortunately, the Standard makes no distinctions, and maintainers of free compilers are loath to forego any "optimizations" they view the Standard as giving them, leaving the language with nothing but "hope for the best" semantics except on implementations that either refrain from cross-procedural optimizations or document what needs to be done to block them.

supercat
  • 77,689
  • 9
  • 166
  • 211