3

I think this code should warn about an out-of-bounds array access:

int foo() {
  int x[10] = {0};
  int *p = &x[5];
  return p[~0LLU];
}

I know out-of-bounds warnings are not required by the standard but compilers do give them. I'm asking whether it would be correct for the compiler to give such a warning here.

Any reason why that code should be consider well formed?

Goswin von Brederlow
  • 11,875
  • 2
  • 24
  • 42
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/186893/discussion-on-question-by-goswin-von-brederlow-shouldnt-this-give-an-out-of-bou). – Samuel Liew Jan 18 '19 at 01:13

3 Answers3

4

I think this code should warn about an out-of-bounds array access:

A decent compiler could warn you when you're doing that on non-VLA arrays (gcc does not, but clang does: https://godbolt.org/z/lOvl5n)

For this snippet:

int foo() {
  int x[10] = {0};  
  return x[~0LLU];  // or x[40] to make it simpler, same thing
}

warning:

<source>:3:10: warning: array index -1 is past the end of the array (which contains 10 elements) [-Warray-bounds]

  return x[~0LLU];

         ^ ~~~~~

The compiler knows that this is an array, knows the size and therefore can check bounds if everything is literal (non-VLA array and literal index is the prerequesite)

In your case, what "loses" the compiler is that you're assigning to a pointer (array decays into a pointer)

After that, the compiler isn't able to tell the origin of the data, so it cannot control the bounds (even if in your case, the offset is ludicriously big / negative / whatever). A dedicated static analysis tool might find the issue.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • @NathanOliver yes, but compiler diagnostic thinks it's -1 anyway :) – Jean-François Fabre Jan 17 '19 at 15:05
  • I think this `-1` thing is a bug. Or some implementation defined thing. Also `-1` can't be "past the end of array". Looks like some display-only thing – Eugene Sh. Jan 17 '19 at 15:25
  • probably. Who cares: it's wrong on purpose. But given the comments in the question, this seems to interest people a lot. Maybe matter for a question by itself :). That said note that you can pass negative offsets to pointers. And compiler doesn't expect 2**64-1 as an offset. Not enough ram in a town anyway :) – Jean-François Fabre Jan 17 '19 at 15:27
  • By "bug" I meant a compiler bug. – Eugene Sh. Jan 17 '19 at 15:30
  • The question really comes down to weather or not ~0LLU is -1 or not. Because `p[-1]` from the question is fine. What makes the compiler in the above example think x[~0LLU] == x[1]? And is it right to do so? – Goswin von Brederlow Jan 17 '19 at 15:54
  • 1
    No, I think this is a different question altogether. – Jean-François Fabre Jan 17 '19 at 15:58
  • 1
    @GoswinvonBrederlow re;[weather or not ~0LLU is -1 or not](https://stackoverflow.com/questions/54238506/shouldnt-this-give-an-out-of-bounds-warning#comment95303932_54238594). `~0LLU` in a `unsigned long long`. It cannot have the value of -1. – chux - Reinstate Monica Jan 17 '19 at 16:12
  • 1
    @GoswinvonBrederlow `p[-1]` is fine, because there's nothing to say that `p[0]` is the start of the array. The part of the compiler that checks bounds probably converts to a signed number simply to avoid having 2 code paths - anything over 0x8000000000000000 will be out of bounds on any computer that will be made in our lifetime, so it makes no practical difference. – Mark Ransom Jan 17 '19 at 19:14
  • IMO, this answer's _what "loses" the compiler is that you're assigning to a pointer_ combined with @Eric Postpischil [comment](https://stackoverflow.com/questions/54238506/shouldnt-this-give-an-out-of-bounds-warning/54238578?noredirect=1#comment95302904_54238506) would make the best answer. – chux - Reinstate Monica Jan 18 '19 at 00:41
  • Re “The compiler knows that this is an array, knows the size and therefore can check bounds if everything is literal”: In this case, the compiler does not need to know the array size. Unless it supports arrays with `~0ULL` elements, `p+~0ULL` can never have behavior defined by the C or C++ standards. Maybe you would want the compiler not to warn if you are supporting operating system or bare metal code where you expect people might do funky things with unsigned arithmetic and pointers. But, in normal code, if somebody adds a larger-than-possible value to any pointer, you could warn. – Eric Postpischil Jan 18 '19 at 03:05
  • I think we could create 2 separate questions here. Because the "max_uint" turning into -1 or not is a good question itself (even if not as practical as "why the compiler doesn't warn"). All discussions seem to revolve around this. Most compilers probably use `-1` (a bug?) and it works. – Jean-François Fabre Jan 18 '19 at 07:49
  • @Jean-FrançoisFabre What's the other question? Because I think weather in this case (and by what reasoning) max_uint turns into -1 is the only question. If the conversion to signed is valid the code is valid. If it isn't valid then it's an out-of-bounds. – Goswin von Brederlow Jan 21 '19 at 10:07
  • Down voting this answer because `x[~0LLU]` eliminates the ambiguity of the question is about. It will always be out-of-bounds. – Goswin von Brederlow Jan 21 '19 at 13:29
2

The C language imposes no requirements on bounds checking of arrays. That is part of what makes it fast. That being said, compilers can and do perform check in some situations.

For example, if I compile with -O3 in gcc and replace return p[~0LLU]; with return p[10]; I get the following warning:

x1.c: In function ‘foo’:
x1.c:6:10: warning: ‘*((void *)&x+60)’ is used uninitialized in this function [-Wuninitialized]
   return p[10];

I get a similar warning if I use -10 as the index:

gcc -g -O3 -Wall -Wextra -Warray-bounds -o x1 x1.c
x1.c: In function ‘foo’:
x1.c:6:10: warning: ‘*((void *)&x+-20)’ is used uninitialized in this function [-Wuninitialized]
   return p[-100];

So it does seem that it can warn about invalid negative values for an array index.

In your case, it seems for this compiler that the value ~0LLU is converted to a signed value for the purposes of pointer arithmetic and is viewed as -1.

Note that this check can be fooled by putting other initialized variables around x:

int foo() {
  int y[10] = {0};
  int x[10] = {0};
  int z[10] = {0};
  int *p = &x[5];
  printf("&x=%p, &y=%p, &z=%p\n", (void *)x, (void *)y, (void *)z);
  return p[10] + y[0] + z[0];
}

This code produces no warnings even though p[10] is out of bounds.

So it's up to the implementation if it wants to perform a out-of-bounds check and how it does it.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • Big part of the question is weather this is an overflow or if the standard mandates this evaluates to p[-1]. – Goswin von Brederlow Jan 17 '19 at 15:14
  • 1
    OP: "I think this code should warn about an out-of-bounds array access". OP isn't talking about runtime error, but rather compile-time error. – Jean-François Fabre Jan 17 '19 at 15:19
  • Agree that in OP's case "`~0LLU` is converted to a signed value for the purposes of pointer arithmetic and is viewed as -1", yet C does not impose that conversion - it allows it. On another platform `p[~0LLU]` attempts an array access with a large positive value - too large for `x[]`. – chux - Reinstate Monica Jan 17 '19 at 16:39
  • @chux, I do not think C even *allows* that interpretation, except in the sense that the behavior is undefined, and therefore anything can happen. – John Bollinger Jan 17 '19 at 18:29
  • @JohnBollinger Certainly allowed for `~0LLU` to be a valid index to an array - someday. Even in a real sense in 2019 considering not all the memory needs to be physically there to access an element. `unsigned long long` is not even specified as the the widest integer type available. OP's concern about `~0LLU` should instead be with `UINTMAX_MAX` as it is the `(u)intmax_t` that impose some limits, not `unsigned long long`. – chux - Reinstate Monica Jan 17 '19 at 18:57
  • @chux, I'm saying that C specifies the semantics of pointer addition in terms of conventional mathematics, without regard to data type of the operands. Certainly C does not allow `~0ULL` itself to be interpreted as -1, and neither does it allow such a reinterpretation in the context of pointer arithmetic, except in the sense that anything is allowed when the behavior is undefined. – John Bollinger Jan 17 '19 at 19:21
  • @JohnBollinger [Agreed](https://stackoverflow.com/questions/54238506/shouldnt-this-give-an-out-of-bounds-warning/54238578?noredirect=1#comment95310143_54238578). – chux - Reinstate Monica Jan 17 '19 at 19:31
  • @dbush Please remove this answer as it adds nothing to the question. It's not about how smart the compiler it but about the validity of the code. Weather the standard mandates some form of conversion making ~0LLU a valid index or not. Consensus (see discussion) is that it does not and the code it invalid due to out-of-bounds. – Goswin von Brederlow Jan 28 '19 at 10:12
2

Edit: Complete rewrite, with standard quotes:

[dcl.array] [ Note: Except where it has been declared for a class, the subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2))

[expr.add] When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined.

Therefore p[~0LLU] is interpreted identically to *(p + ~0LLU) (as per [dcl.array]) where the parenthesised expression points to the element x[5 + ~0LLU] - if the index is within the valid range - (as per [expr.add]). If the index isn't within range, the behaviour is undefined.

Is 5 + ~0LLU within the valid range of indices though? Given integer conversion rules of the language, the shown expression would appear to be well-defined if the type of 5 were a signed type of no larger size than unsigned long long, and in that case the pointed element would be x[4]. However, standard doesn't explicitly define the type of i and j in the expression that describes the behaviour. It should be interpreted to be a pure mathematic expression in which case the result would be an index unrepresentable by long long unsigned and certainly greater than n and thus undefined behaviour.

Given the interpretation that behaviour is undefined, it wouldn't be incorrect for the compiler to warn. Regardless, the compiler is not required to warn.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • I said "should", not "must". gcc, clang and other compilers do give such warning. Edited the question to clarify. – Goswin von Brederlow Jan 17 '19 at 15:03
  • The addition of `5 + ~0LLU` makes sense to be 4 if the `5` was of a type lower ranked than `unsigned long long` as it is in an integer expression. Yet that `5` here is not defined of a lower ranked type - it has no prescribed type at all. In terms of pointer math, it can be of a "type" wider the `unsigned long long` and so `5 + ~0LLU` is a large positive value. – chux - Reinstate Monica Jan 17 '19 at 16:27
  • @chux Indeed. Thus the conclusion that the behaviour is undefined. – eerorika Jan 17 '19 at 16:28
  • Just because `p == x + 5` does not necessarily mean that `p + ~0LLU == x + 5 + ~0LLU`. Each subexpression needs to have defined behavior *on its own* before the overall expression does, so you're begging the original question. – John Bollinger Jan 17 '19 at 16:28
  • Furthermore, the addition operator associates left-to-right, but overall is not associative in the mathematical sense, so `x + 5 + ~0LLU` would be evaluated as `(x + 5) + ~0LLU`, which has the same undefinedness problem that `p + ~0ULL` does. The behavior of `x + (5 + ~0LLU)` is not required to be equivalent, and that in this case the latter expression has defined behavior is irrelevant. – John Bollinger Jan 17 '19 at 18:45
  • The standard text says *i + j*, not `i + j` as you have put in your quote -- so it seems to me it is talking about the mathematical value rather than the typed expression. `0 ≤ i + j ≤ n` is certainly not a valid expression . – M.M Mar 11 '20 at 23:29