4

I came across this post https://stefansf.de/post/pointers-are-more-abstract-than-you-might-expect/ which mentions

C11 § 6.5.9 paragraph 6 Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

as rationalization for why in this program may address b right after a and yet have &a+1 compare unequal to &b even though the printouts of the addresses are indentical

#include <stdio.h>
int main(void) {
    int a, b; int *p = &b; int *q = &a + 1;
    printf("%p %p %d\n", (void *)p, (void *)q, p == q);
}

This made me wonder that this cannot possibly be maintained across noninlinable function boundaries (or translation unit boundaries or DSO boundaries) so I tried:

#include <stdio.h>

#pragma GCC optimize "O2"
void addrcmp(void *a1_p, void *b_p)
{
    printf("%p %c= %p\n", a1_p,"!="[a1_p==b_p],b_p);
    int *a1p = a1_p;
    int *bp = b_p;
    a1p[-1] *= 10;
    bp[0] *= 10;
}

int main()
{
#if __clang__ /*clang appears to lay the variables in reverse order*/
    int b=1, a=2;
#else
    int a=1, b=2;
#endif
    printf("%p %c= %p\n", (void*)(&a+1),"!="[&a+1==&b],(void*)&b);
    printf("%d %d\n", a, b);
    addrcmp(&a+1,&b);
    printf("%d %d\n", a, b);
}

and indeed, this is getting me

0x7ffc38b8ec5c != 0x7ffc38b8ec5c
1 2
0x7ffc38b8ec5c == 0x7ffc38b8ec5c
10 20

with gcc (6.4), and even

0x7ffe5d0cee6c == 0x7ffe5d0cee6c
2 1
0x7ffe5d0cee6c == 0x7ffe5d0cee6c
20 10

with clang (3.4).

My question is, does the 6.5.9p6 rule even make sense? It seems impossible to track something like that efficiently.

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • 3
    `a` and `b` are distinct variables, so there is no relationship between `&a` and `&b`. The compiler knows this, so it does not need to actually do the comparison; it substitutes `0` for `&a + 1 == &b`. – AlexP Jul 10 '18 at 23:26
  • 2
    gcc intentionally doesn't follow the standard, and it has been recommended that 6.5.9p6 be changed for C20 – M.M Jul 10 '18 at 23:30
  • And regarding your final question, one should always assume that all requirements in all language standards are qualified with *"if it is at all practicable for the compiler to follow this rule"*. – AlexP Jul 10 '18 at 23:33
  • 1
    I don't see any need for "tracking". Your tests show that the compiler deliberately refuses to acknowledge the pointer equality only when the situation is obvious at compile time (i.e. localized). In all other cases it makes a honest comparison of the pointers. No "tracking" necessary. The compiler apparently rejects the equality because it does not want users to rely on it. The resultant behavior is "weird", of course. – AnT stands with Russia Jul 10 '18 at 23:34
  • @AlexP Thanks. That's an excellent point. I believe `&a+1` makes the whole thing undefined due to http://port70.net/~nsz/c/c11/n1570.html#6.5.6p8. However, if I to change the integers to 1-member integer arrays and do the equivalent operations, then the pointer arithmetic should no longer render the program undefined and yet the results are the same. I guess that's gcc not following the standard as M.M says. – Petr Skocik Jul 10 '18 at 23:38
  • 2
    What do you mean when you say that "`&a+1` makes the whole thing undefined"? I don't see any problems with 6.5.6p8. – AnT stands with Russia Jul 10 '18 at 23:40
  • @AnT "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined." – Petr Skocik Jul 10 '18 at 23:43
  • 3
    @PSkocik: And? A "single" object acts as an "array of size 1" for the purposes of pointer arithmetic (see 6.5.6/7). There's nothing wrong with pointer arithmetic in `&a + 1`. It produces a pointer, which points past-the-end of the conceptual "array of size 1" represented by `a`. – AnT stands with Russia Jul 10 '18 at 23:49
  • 1
    @AlexP: The authors of the C Standard deliberately recognize the possibility of implementations that are conforming but of such poor quality as to be useless. It is also possible to have useful compilers that process non-conforming C dialects. What is the point of the C Standard if conforming implementations can disregard it whenever they want? – supercat Jul 10 '18 at 23:57
  • 2
    @AnT: GCC has a tendency to combine optimizations which would be fine if done individually, but are deadly if done together. That combined with a lack of any guarantees about when various optimizations will or will not be performed mean that it can only be counted upon to reliably process a tiny subset of the C language, since there's no telling in what circumstances a future compiler might manage to find "optimizations" [ways of breaking code] that today's compilers "miss". – supercat Jul 11 '18 at 00:00
  • @PSkocik: It is not that `&a + 1` is undefined. It is very well defined. The point is that the compiler *knows* that `a` and `b` are unrelated variables, and therefore a pointer derived from `&a` should never be equal to a pointer derived from `&b`. This is a perfectly valid inference. It is meaningless to compare pointers derived from unrelated variables. – AlexP Jul 11 '18 at 00:06
  • @AlexP That's Octalist is arguing in the linked question's answer, but I think that inference is wrong (see mine and AnT's comment on that answer). Equality-comparison (as opposed to relational comparison) of unrelated objects shouldn't be UB, and 6.5.6/76 specifically allows for a pointer one past the end of an array (6.5.6/7 makes non array objects equivalent to 1-sized arrays of them) to point to an array object that "happens to follow" the first object. Would then equality comparison of unrelated objects be undefined except when objects "happen to follow" each other? – Petr Skocik Jul 11 '18 at 00:26
  • @PSkocik: It *not* undefined behavior. It is well-defined behavior. Two pointers known to be derived from unrelated variables are always unequal. A pointer one past the end of the array is still a pointer in that array, and *in principle* it should never compare equal to a pointer derived from another object. In some cases it is just not practicable for the compiler to keep track of where the pointers are ultimately derived from, and in such cases the program may get the wrong result for the comparison. – AlexP Jul 11 '18 at 00:57
  • @AlexP Glad you agree it's not UB. But http://port70.net/~nsz/c/c11/n1570.html#6.5.9p6 says the pointers should compare equal if, among other things, the &a+1 pointer points to an array object that _happens to follow_ `&a` in the address space (And the next point clarifies that non-arrays are considered as 1-member arrays for the purposes of these comparisons), which is the case here. So as M.M said, gcc's is doing it wrong with the inequality. – Petr Skocik Jul 11 '18 at 01:07
  • My approach is more "fun": [Dereferencing a 50% out of bound pointer (array of array)](https://stackoverflow.com/q/32100245/963864) – curiousguy Jul 11 '18 at 03:13
  • 1
    @AlexP: If the objects in question are declared `extern`, a quality compiler suitable for low-level programming on a platform where some languages (or linker configuration files) would allow more precise control over object placement than C does should allow for the possibility that a programmer might have reason to care about a relationship like the above. For example, some linkers for fixed-memory targets like embedded systems can generate a symbol that points just past the highest allocated object and another that points just below top of memory. It may be perfectly reasonable... – supercat Jul 11 '18 at 19:35
  • ...and realistic for code to do something like `extern uint32_t HEAP_START[], HEAP_END[]; for (uint32_t *p = HEAP_START; p != HEAP_END; p++) *p = 0;` C might not have any way to specify such placement for those objects, but a quality implementation suitable for low-level programming on a platform that does have such means should support the semantics implied thereby. – supercat Jul 11 '18 at 19:38
  • Unfortunately, I can't find any mode in which gcc behaves as a quality implementation suitable for low-level programming. In `-O0` it has correct semantics, but the code quality is generally dreadful. Even in `-O1` it can't handle things like `if ((uintptr_t)&someExtern == 0xC000000)` despite the fact that it has no way of knowing whether `someExtern` might have been placed there. – supercat Jul 11 '18 at 19:47

0 Answers0