5

Recently, I saw a C code like this:

#include <stdio.h>

int main(void) {
    int array[5] = {1, 2, 3, 4, 5};

    for (int* ptr = &array[0]; ptr != &array[5]; ptr++)
        printf("%d\n", *ptr);

    return 0;
}

Since operator [] is prioritized over operator & in C, I think &array[5] is equivalent to &(*(array + 5)), which causes undefined behavior (we are not allowed to dereference array + 5). That is why I suspect the code above is ill-formed. (By the way, I know that ptr != array + 5 is okay.)

I tested this code using GCC 11.1.0 and Clang 12.0.0 with -O0 -fsanitize=address,undefined compiler flags, but both compilers interpreted &array[5] as array + 5, and no unexpected behavior happened.

Is &array[i] always equivalent to array + i (even when array[i] is invalid)? Thank you in advance.

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
  • 1
    You're correct that `&array[5]` is equivalent to `&(*(array + 5))`, but I think both are valid. Neither actually dereferences the pointer. They just use it to do pointer arithmentic. – Tom Karzes Jun 18 '21 at 04:12
  • 1
    In general you'd need a `sizeof()` to determine the offset per element. – pjs Jun 18 '21 at 04:59
  • 5
    Possible duplicate: https://stackoverflow.com/questions/38915128/address-of-dereferenced-pointer-construct answers your question with a quote from the standard: §6.5.3.2 "... If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted" So no undefined behaviour. – Jerry Jeremiah Jun 18 '21 at 05:31
  • 1
    If I were you, I'd use `array + i` just in case. *"code above is ill-formed"* Ill-formed means "requires a compilation error", so it's not that. – HolyBlackCat Jun 18 '21 at 05:55
  • Thank you for the informative comments and answers, and I'm sorry that there are similar questions. Now I understand that <1> `&(*(array + 5))` is simplified away to `array + 5`. <2> Address `array + i (i = 0, 1, ..., 4)` is valid, and `array + 5` is also valid if it's not dereferenced, and `array + i (i > 5)` is invalid even if it's not dereferenced. <3> My usage of the word "Ill-formed" is not correct. – user16257574 Jun 18 '21 at 06:20
  • 1
    @pjs, I don't understand your statement. Pointer arithmetic is performed in element-sized steps already, so what would `sizeof` accomplish? Can you give an example? – Ulrich Eckhardt Jun 18 '21 at 06:54
  • 1
    This has been asked several times before , but SO's search function doesn't make it easy to find – M.M Jun 18 '21 at 06:57
  • @pjs for what purpose would one need the size of an object? It is sufficient that the compiler knows it. – Gerhardh Jun 18 '21 at 07:07
  • Dup of [Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?](https://stackoverflow.com/questions/988158/take-the-address-of-a-one-past-the-end-array-element-via-subscript-legal-by-the) (The question is also about C) – Language Lawyer Jun 18 '21 at 13:06
  • @M.M _This has been asked several times before , but SO's search function doesn't make it easy to find_ It is very easy to find dups. Admit, you just want reputation. – Language Lawyer Jun 18 '21 at 13:12
  • @LanguageLawyer 130K rep should be enough for anyone , I would rather dup-close tbh – M.M Jun 18 '21 at 21:41
  • @M.M rep, like money, is never enough – Language Lawyer Jun 18 '21 at 22:52
  • @LanguageLawyer: Why should a question about C regarded as a duplicate for a question whose title explicitly refers to the C++ Standard? Even if some answers to the latter question talk about both languages (since it's useful to know in what ways they're the same and in what ways they differ) making it useful as supplemental information, I don't think that makes the C question a duplicate of a question which, as written, is *focused* on C++. – supercat Jun 21 '21 at 17:31

3 Answers3

10

Firstly there is 6.5.2.1/2:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))

Then it is defined in (6.5.3.2/3) , the unary & operator:

[...] Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator.

Which is explicitly saying that &x[y] means (x) + (y) exactly.

M.M
  • 138,810
  • 21
  • 208
  • 365
2

even when array[i] is invalid

Answer from sanitiser's point of view:

&array[i] and array+i always give the same pointer, but only &array[i] will raise a runtime error by address-sanitizer (at least in gcc). So, in this respect they are not equivalent.

Note that, if i=5 in your case, the address sanitizer will not give an error if the pointer is not dereferenced, so the code above will work (even if sanitizer is turned on). If i is bigger than 5, however, sanitizer immediately gives an error. Regarding the code above it is suggested to use pointer arithmetic (if you insist on using pointers):

for (int* ptr = array; ptr < array+5; ptr++)
Laci
  • 2,738
  • 1
  • 13
  • 22
0

Although the C Standard defines the behavior of the [] operator in terms of the pointer addition and dereference operators + and *, + and unary * operators, such that that array[i] means *(array+i), neither clang nor gcc actually processes it that way. The behaviors are identical in circumstances where both operators would yield defined behavior, but they interact differently with the compilers' interpretation of the "strict aliasing rule".

The Standard, for example, makes no general allowance for accessing union objects using member-type lvalues. Given a declaration like:

union blob { unsigned short hh[4]; unsigned ww[2]; } u;

it would be absurd to suggest that the only way of accessing those members would be by using character pointers or functions like memcpy, especially since the Standard explicitly allows type punning via union objects. On the other hand, if one tests in gcc the functions:

int test1(int i, int j)
{
    u.hh[i] = 1;
    u.ww[j] = 2;
    return u.hh[i];
}
int test2(int i, int j)
{
    *(u.hh+i) = 1;
    *(u.ww+j) = 2;
    return *(u.hh+i);
}

both compilers will generate code for test1 that accommodate the possibility of type punning, but will generate for test2 code that unconditionally returns 1. This is allowable because the Standard characterizes both forms as Undefined Behavior so as to allow implementations to--on a Quality of Implementation basis--support whichever form(s) their customers would find useful.

While I don't know about how the behaviors compare in the particular expression you list, the fact that they interpret [] differently from a combination of + and * means that the Standard's definition of the former in terms of the latter should not be taken as an indication that they will be processed identically.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • This seems to be just describing a compiler bug, `test2(0, 0)` is not UB (although is implementation-defined due to endianness) – M.M Jun 18 '21 at 21:53
  • @M.M: The maintainers of both clang and gcc view union type punning and the Common Initial Sequence guarantees as applying only to accesses made "directly" through an lvalue of union type, and as not being applicable in cases where they are made via pointer which is derived from a union lvalue in ways a compiler would have to be willfully blind not to notice. The authors of the Standard made no particular effort to avoid characterizing as UB actions which they expected that most implementations would process consistently absent a good reason to do otherwise, but instead expected that... – supercat Jun 18 '21 at 22:17
  • ...quality implementations would, as a form of "conforming language extension", behave meaningfully in more cases than mandated by the Standard. While such laxity makes the concept of "Strictly Conforming" program less useful than it might otherwise be, the vast majority of C programs would be Conforming but not Strictly Conforming anyhow. – supercat Jun 18 '21 at 22:23
  • I doubt your claim that the authors of the standard intended `test2` to be UB . Given that the standard explicitly defines test1 and test2 as identical, it seems like a poor decision on the part of the compiler vendor to break `test2`. Not a "quality implementation" as you would say – M.M Jun 19 '21 at 02:34
  • @M.M: I wouldn't say so much that they "intended" it to be UB as that they were indifferent to whether it was, because would have expected quality implementations to process it meaningfully *whether or not the Standard required them to do so*. Many issues could have been resolved if the Standard had said, in essence, that if code derives a pointer or lvalue of one type from a pointer or lvalue of another, operations using the pointer are unsequenced between the point of derivation and the point of use. They would need to be a little more complicated to handle function and loop boundaries... – supercat Jun 19 '21 at 15:49
  • ...but the authors of the Standard didn't think it necessary to go into such detail since they expected people seeking to sell implementations would likely be able to judge better than the Committee how to avoid breaking their customers' code. What they failed to consider was that people who don't have to sell compilers would use the Standard to claim that customers' code was already broken because it's not *strictly* conforming, despite the fact that such code was *conforming*--just not *strictly*. – supercat Jun 19 '21 at 15:52