Why does the compiler assume that these seemingly equal pointers differ?

Question

Looks like GCC with some optimization thinks two pointers from different translation units can never be same even if they are actually the same.

Code:

main.c

#include <stdint.h>
#include <stdio.h>

int a __attribute__((section("test")));
extern int b;

void check(int cond) { puts(cond ? "TRUE" : "FALSE"); }

int main() {
    int * p = &a + 1;
    check(
        (p == &b)
        ==
        ((uintptr_t)p == (uintptr_t)&b)
    );
    check(p == &b);
    check((uintptr_t)p == (uintptr_t)&b);
    return 0;
}

b.c

int b __attribute__((section("test")));

If I compile it with -O0, it prints

TRUE
TRUE
TRUE

But with -O1

FALSE
FALSE
TRUE

So p and &b are actually the same value, but the compiler optimized out their comparison assuming they can never be equal.

I can't figure out, which optimization made this.

It doesn't look like strict aliasing, because pointers are of one type, and -fstrict-aliasing option doesn't make this effect.

Is this the documented behavour? Or is this a bug?

Have you tried printing out the pointer values, just to see if you can figure out what's going on? Also, try dumping the symbol table (using 'nm') to see how it's allocating them. Maybe the optimization is just changing the memory ordering. BTW, don't be surprised if the printf changes the behavior, because the compiler won't optimize around functions it can't control. — Timothy Miller, Mar 16 '16 at 12:35
Because check((uintptr_t)p == (uintptr_t)&b); always returns true, they must have the same value. Maybe its using the wrong == operator above? — LawfulEvil, Mar 16 '16 at 12:36
If you can tell us why this pointer math is important to your problem, maybe we can help you solve it a different way. Why does the memory ordering matter? — Timothy Miller, Mar 16 '16 at 12:37
You assume `&a + 1` points to `b`. Is this guaranteed? I think that even the order of `a` and `b` in memory depends on linkage order. — Ivan Aksamentov - Drop, Mar 16 '16 at 12:38
@TimothyMiller yes, I printed them with printf(%p), the difference between &a and &b is really sizeof(int). Please notice (uintptr_t)p == (uintptr_t)&b. — Yuri Syro, Mar 16 '16 at 12:40
Keep in mind that the behavior of pointer comparison is defined only for a few specific cases and undefined for all others. — Theodoros Chatzigiannakis, Mar 16 '16 at 12:41
*Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined.* — 2501, Mar 16 '16 at 12:41
@LawfulEvil yes, they must, and they have. But optimizer thinks they can't. — Yuri Syro, Mar 16 '16 at 12:42
@CheeseRussian We still don't know why you're doing this. Can you make a single global array that's extern'd in other files? The compiler can reason over pointers within the same array, while it's making an assumption about what it thinks are independent variables. — Timothy Miller, Mar 16 '16 at 12:43
@TimothyMiller I'm working on an in-house C testing framework which uses values placement for automatic discovery of tests. — Yuri Syro, Mar 16 '16 at 12:45
If this is for functional testing, can you do most of the testing with -O0 ? — Timothy Miller, Mar 16 '16 at 12:46
@Drop I don't think it is guaranteed by standard, but in this example this fact is proven by the third check. — Yuri Syro, Mar 16 '16 at 12:47
`uintptr_t` is only guaranteed to convert from/to a pointer to the same type correctly. There is no guarantee the integer values compare equal. — too honest for this site, Mar 16 '16 at 12:50
@TimothyMiller I'm building the global array in compile time, only by linking files containing `struct test_info` instances. — Yuri Syro, Mar 16 '16 at 12:50
@TimothyMiller it's not my code, it has been working for years, but broke on GCC 5.2. — Yuri Syro, Mar 16 '16 at 12:51
@TimothyMiller About -O0: I want to know the salt of this problem and write the code working on any optimization level. — Yuri Syro, Mar 16 '16 at 12:54
@Drop I looked to the disasm, at -O1 compiler cuts comparison out. — Yuri Syro, Mar 16 '16 at 12:55
There is no array declaration. So the compiler is free to assume the pointers cannot compare equal. For the `uintptr_t` comparisons, see my other comment. The different sections might also be relevant. The code is definitively broken. Fix it, don't even think about "solving" by removing an optimisation. UB is UB. — too honest for this site, Mar 16 '16 at 12:56
Why can't you figure out which optimization triggered this? Did you try manually adding all optimizations of the -O1 level and then removing them one-by-one? — Theodoros Chatzigiannakis, Mar 16 '16 at 12:56
@CheeseRussian Here's another question. Why is it important to do this pointer comparison at compile time? You've obviously found a way to cast pointers so that they do compare correctly. Why not use that? Another thing you can do (since this is just a testing framework) is to cast all pointers you're comparing to "unsigned long". — Timothy Miller, Mar 16 '16 at 13:02
@Olaf Oh, I see. Pointer arithmetic only must work inside arrays. Could you repeat it as an answer? — Yuri Syro, Mar 16 '16 at 13:03
@TimothyMiller currently, I fixed it with `volatile`, but I want to be sure. — Yuri Syro, Mar 16 '16 at 13:04
@TheodorosChatzigiannakis I tried. `--help=optimizers` and man gcc show different sets of optimizations for -O1. But both can't help. This behavior is triggered be something else. — Yuri Syro, Mar 16 '16 at 13:06
@Cheese: `volatile` is definitively the wrong approach. That also does not fix anything, but breaks the code even more (can there be more UB?). You just might push the border a bit further; a future gcc, a more agressive optimisation (why not use `-O3`?) or using a different compiler like clang can very well result in more subtle problems you will not detect that easy. — too honest for this site, Mar 16 '16 at 13:39
Out of curiosity: What happens if the two variables are in the same translation unit? — Peter - Reinstate Monica, Mar 16 '16 at 14:29
@PeterA.Schneider: That should not matter. If the compiler can prove the pointers point to different arrays (a single variable is treated here as an 1-entry array), it is free to do whatever it wants. There is no use in researching what a specific implementation does. — too honest for this site, Mar 16 '16 at 15:36
@PeterA.Schneider: So if it works with a specific compiler, on a specific time of compilation and execution, what does that say for a different compiler/other optimizer settings, a different time of day, rainy weather, etc.? - Nothing! It is undefined behaviour. Different from other UB, which can be explained by architectural requirements, this one cannot. In the contraire, it is quite obvious there are good reasons to generate different results even for the same target. So no, no knowledge gained. — too honest for this site, Mar 16 '16 at 19:54
After some discussion with M.M and reading the standard I have made a case why it is a bug. Cf. my edited post below. — Peter - Reinstate Monica, Mar 18 '16 at 13:49
This issue is being addressed by the standards committee as [N2012](http://www.open-std.org/JTC1/SC22/WG14/www/docs/n2012.htm) - see section "Pointer provenance / Q2". Have updated my answer. — M.M, Apr 26 '16 at 00:36
@Olaf No, volatile blocks some optimizations, by definition. If a future compiler breaks volatile, then you can't use it: it's broken. — curiousguy, Jun 09 '16 at 03:40
See also http://stackoverflow.com/questions/32043795/dereferencing-an-out-of-bound-pointer-that-contains-the-address-of-an-object-ar — curiousguy, Jun 09 '16 at 04:00
@Olaf "_There is no guarantee the integer values compare equal._" no, but they do compare equal! — curiousguy, Jun 09 '16 at 04:51
@curiousguy: It is not clear what your problem is with `volatile`. The behaviour is clearly specified. Also there is a defect report for the standard which clarifies the behaviour of current implementations is the correct one. It is expected to be included in the next version of the standard or some correction update. But even if not, it is the current expected behaviour. No discussion necessary. For the rest, see the link M.M. posted, you had enough time already. To repeat: just because a specific impolementation works does not mean it is **not** UB! — too honest for this site, Jun 09 '16 at 13:33
@Olaf I don't have a problem with volatile, I am sure it would fix the pointer comparison problem. Until the DR has been approved, validated, integrated in a TC, the implementations are non conforming. I am not sure where you are getting the idea this behavior where `==` isn't an equality is "expected" by users. — curiousguy, Jun 09 '16 at 16:19
@Olaf I am not sure which DR you are talking about anyway. Anyway, it isn't a "clarification" when the solution proposed contradicts the reasonable (and only in this case) interpretation of a non crazy part of the standard. You are essentially claiming "nothing to see, move along" just in front of a major semantic crash. Not very credible! — curiousguy, Jun 09 '16 at 16:27
@curiousguy: A DR typically already means the issue has been accepted and the corrected behaviour in the report should be implemented. Even more, as it clearly states all major compilers (at least) implement the correct behaviour as the standard's is clearly wrong. Btw, there is only one DR about `volatile`, so it should be clear which one. I'll leave it at that, because you seem not to be able to discuss without offending. — too honest for this site, Jun 09 '16 at 17:14
@Olaf "_Even more, as it clearly states all major compilers (at least) implement the correct behaviour as the standard's is clearly wrong._" Hug? What is the correct behavior and how could the standard be "clearly" wrong when it states that `==` is an equality relation? — curiousguy, Jun 09 '16 at 19:38
I believe that gcc's behavior in this area is buggy, though the gcc maintainers disagree. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63611 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502 There is a similar bug report for Clang, and it was fixed: https://bugs.llvm.org/show_bug.cgi?id=21327 — Keith Thompson, Aug 30 '17 at 22:47

too honest for this site · Accepted Answer · 2016-03-19T17:29:24.363

12

There are three aspects in your code which result in general problems:

Conversion of a pointer to an integer is implementation defined. There is no guarantee conversion of two pointers to have all bits identical.
uintptr_t is guaranteed to convert from a pointer to the same type then back unchanged (i.e. compare equal to the original pointer). But nothing more. The integer values themselves are not guaranteed to compare equal. E.g. there could be unused bits with arbitrary value. See the standard, 7.20.1.4.
And (briefly) two pointers can only compare equal if they point into the same array or right behind it (last entry plus one) or at least one is a null pointer. For any other constellation, they compare unequal. For the exact details, see the standard, 6.5.9p6.

Finally, there is no guarantee how variables are placed in memory by the toolchain (typically the linker for static variables, the compiler for automatic variables). Only an array or a struct (i.e. composite types) guarantee the ordering of its elements.

For your example, 6.5.9p7 also applies. It basically treats a pointer to a non-array object for comparision like on to the first entry of an array of size 1. This does not cover an incremented pointer past the object like &a + 1. Relevant is the object the pointer is based on. That is object a for pointer p and b for pointer &b. The rest can be found in paragraph 6.

None of your variables is an array (last part of paragraph 6), so the pointers need not compare equal, even for &a + 1 == &b. The last "TRUE" might arise from gcc assuming the uintptr_t comparison returning true.

gcc is known to agressively optimise while strictly following the standard. Other compilers are more conservative, but that results in less optimised code. Please don't try "solving" this by disabling optimisation or other hacks, but fix it using well-defined behaviour. It is a bug in the code.

edited Mar 19 '16 at 17:29

answered Mar 16 '16 at 13:12

too honest for this site

12,050
4
30
52

3

I believe aspect 1 is implementation-defined behavior, and not undefined behavior. – Vaughn Cato Mar 16 '16 at 13:22
1

+1, code that breaks when an optimization is turned on should be fixed. The fact that it broke across different GCC versions is even more of a red flag that it should not be ignored. – Theodoros Chatzigiannakis Mar 16 '16 at 13:27
@VaughnCato: If it was implementation defined, the standard would state it and an implementation had to document it. Nevertheless there are some aspects of the conversion **specific** to the implementation, of course. Maybe you can mask-out bits not used for a pointer before comparison. But that still would leave the UB of the other code. UB means **anything** can happen - from nothing bad (worst case) to forced termination on the error (best case) and your computer jumping out of the window. – too honest for this site Mar 16 '16 at 13:28
4

My reading is that `(uintptr_t)p == (uintptr_t)&b` involves two things. One is a conversion from pointer to integer type, which has implementation-defined behavior (6.3.2.3/6), and the other is a comparison of two values of integer type, which is well-defined behavior. I don't see what is undefined. – Vaughn Cato Mar 16 '16 at 13:43
@VaughnCato: Point taken. I left out one step. Edited my answer. There might just be unspecified bits in the converted integer. – too honest for this site Mar 16 '16 at 13:49
1

Disagree/unclear with "uintptr_t is guaranteed to convert from a pointer to the same type then back unchanged" pointer --> `uintptr_t` --> pointer results in an equivalent pointer - not necessarily the same pointer bit pattern. – chux - Reinstate Monica Mar 16 '16 at 14:38
@chux: After converting to, them from `uintptr_t` the value must compare equal. That's what I mean with unchanged. Notice the "(briefly)" and see the standard I referenced. – too honest for this site Mar 16 '16 at 14:43
3

You write "Your code does not have an array", however see 6.5.9/7 "For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type" – M.M Mar 18 '16 at 00:13
@M.M: Yes,but only for a pointer to the object itself (i.e. the first element). But OP increments the pointer, thus it does not point to the first element, but past it. While this is allowed for an array, it is not for a single variable according to the section you cite. I don't think your downvote is justified. – too honest for this site Mar 18 '16 at 03:16
6

Your answer reaches a wrong conclusion; `&a + 1 == &b` may be true. So I feel the vote is justified. Your statement "While this is allowed for an array, it is not for a single variable" is rubbish because the standard clearly says the exact opposite: "For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type" – M.M Mar 18 '16 at 04:33
@M.M: I just tried to clarify what I mean, but have trouble formulating. I have a stong feeling gcc is correctly optimising. I'm just not sure how to phrase and fortify it with the standard understandably. Problem is `p` does **not** point to an object, but one past an object. Imo that makes p7 obsolete. In fact, I see one of the constraints unsatisfied -> UB. We all forgot about those. ... – too honest for this site Mar 18 '16 at 04:59
@M.M: ... Problem is, the contraints don't use "pointer to objects of ...", but "pointers to ... versions of ...". I'm not sure if this refers to objects, but how can pointers which are neither null pointers, nor point to an object nor point past an array be compared at all? - I'll have to sleep it over. – too honest for this site Mar 18 '16 at 04:59
@Olaf "pointer to object *type*" means a pointer which is not a pointer to function type (nothing to do with what it might currently be pointing at). *compatible type* is simliar. The constraints are certainly satisfied; points 6 and 7 are just for deciding whether the result is `1` or `0` . In C++ this same situation is *unspecified* which places lower demand on the optimizer, I agree that it is a bit annoying that the optimizer needs to check `&a + 1` in case another object is just after, but this rule is useful if both objects were a struct member for example. – M.M Mar 18 '16 at 06:27
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/106729/discussion-between-olaf-and-m-m). – too honest for this site Mar 18 '16 at 14:25

score 8 · Answer 2 · edited May 23 '17 at 12:34

p == &b is a pointer comparison and is subject to the following rules from the C Standard (6.5.9 Equality operators, point 4):

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

(uintptr_t)p == (uintptr_t)&b is an arithmetic comparison and is subject to the following rules (6.5.9 Equality operators, point 6):

If both of the operands have arithmetic type, the usual arithmetic conversions are performed. Values of complex types are equal if and only if both their real parts are equal and also their imaginary parts are equal. Any two values of arithmetic types from different type domains are equal if and only if the results of their conversions to the (complex) result type determined by the usual arithmetic conversions are equal.

These two excerpts require very different things from the implementation. And it is clear that the C specification places no requirement on an implementation to mimic the behavior of the former kind of comparison in cases where the latter kind is invoked and vice versa. ~~The implementation is only required to follow this rule (7.18.1.4 Integer types capable of holding object pointers in C99 or 7.20.1.4 in C11):~~

The [uintptr_t] type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer.

(Addendum: The above quote isn't applicable in this case, because the conversion from int* to uintptr_t does not involve void* as an intermediate step. See Hadi's answer for an explanation and citation on this. Still, the conversion in question is implementation-defined and the two comparisons you are attempting are not required to exhibit the same behavior, which is the main takeaway here.)

As an example of the difference, consider two pointers that point at the same address of two different address spaces. Comparing them as pointers shouldn't return true, but comparing them as unsigned integers might.

&a + 1 is an integer added to a pointer, which is subject to the following rules (6.5.6 Additive operators, point 8):

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

I believe that this excerpt shows that pointer addition (and subtraction) is defined only for pointers within the same array object or one past the last element. And because (1) a is not an array and (2) a and b aren't members of the same array object, it seems to me that your pointer math operation invokes undefined behavior and your compiler takes advantage of it to assume that the pointer comparison returns false. Again as pointed out in Hadi's answer (and in contrast to what my original answer assumed at this point), pointers to non-array objects can be considered pointers to array objects of length one, and thus adding one to your pointer to the scalar does qualify as pointing to one past the end of the array.

Therefore your case seems to fall under the last part of the first excerpt mentioned in this answer, making your comparison well-defined to evaluate to true if and only if the two variables are linked in sequence and in ascending order. Whether this is true for your program is left unspecified by the standard and it's up to the implementation.

Note that comparing the cast-to-integer-pointer value comparison is technically undefined behaviour. — rubenvb, Mar 16 '16 at 13:11
Yes, it's "technically undefined" but works for most architectures. That being said, making it "volatile" is probably better. — Timothy Miller, Mar 16 '16 at 13:12
Are these quotes from the standard or are they your own words? — trent, Mar 16 '16 at 13:29
@trentcl Anything that's in a quote box is directly from the standard (I forgot to mention that in the answer, editing). — Theodoros Chatzigiannakis, Mar 16 '16 at 13:31
@TimothyMiller: For a 64 bit architecture, not all 64 bits are typically used for the address. So there can be bits used for other purposes. Or take old 16 bit x86/real mode: A 32 bit far pointer consisted of two fields (segment:offset) with different combinations resulting in the same physical address. Comparing two pointers can respect this, comparing two integer conversions of the pointer likely will not. — too honest for this site, Mar 16 '16 at 13:34

Hadi Brais · Answer 3 · 2016-03-19T11:44:27.097

4

While one of the answers has already been accepted, the accepted answer (and all other answers for that matter) are critically wrong as I'll explain and then answer the question. I'll be quoting from the same C standard, namely n1570.

Let's start with &a + 1. In contrast to what @Theodoros and @Peter has stated, this expression has defined behavior. To see this, consider section 6.5.6 paragraph 7 "Additive operators" which states:

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

and paragraph 8 (in particular, the emphasized part):

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

The expression (uintptr_t)p == (uintptr_t)&b has two parts. The conversion from a pointer to an uintptr_t is NOT defined by section 7.20.1.4 (in contrast to what @Olaf and @Theodoros have said):

The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:

uintptr_t

It's important to recognize that this rule applies only to valid pointers to void. However, in this case, we have a valid pointer to int. A relevant paragraph can be found in section 6.3.2.3 paragraph 1:

A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.

This means that (uintptr_t)(void*)p is allowed according to this paragraph and 7.20.1.4. But (uintptr_t)p and (uintptr_t)&b are ruled by section 6.3.2.3 paragraph 6:

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

Note that uintptr_t is an integer type as stated in section 7.20.1.4 mentioned above and therefore this rule applies.

The second part of (uintptr_t)p == (uintptr_t)&b is comparing for equality. As previously discussed, since the result of conversion is implementation-defined, the result of equality is also implementation defined. This applies irrespective of whether the pointers themselves are equal or not.

Now I'll discuss p == &b. The third point in @Olaf's answer is wrong and @Theodoros's answer is incomplete regarding this expression. Section 6.5.9 "Equality operators" paragraph 7:

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

and paragraph 6:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.)

In contrast what @Olaf have said, comparing pointers using the == operator never results in undefined behavior (which may occur only when using relational operators such as <= according to section 6.5.8 paragraph 5 which I'll omit here for brevity). Now since p points to the next int relative to a, it will be equal to &b only when the linker has placed b in that location in the binary. Otherwise, there are unequal. So this is implementation-dependent (the relative order of a and b is unspecified by the standard). Since the declarations of a and b use a language extension, namely __attribute__((section("test"))), the relative locations is indeed implementation-dependent by J.5 and 3.4.2 (omitted for brevity).

We conclude that the results of check(p == &b) and check((uintptr_t)p == (uintptr_t)&b) are implementation-dependent. So the answer depends on which version of which compiler you are using. I'm using gcc 4.8 and by compiling with default options except for the level of optimization, the output I get in both -O0 and -O1 cases is all TRUE.

edited Mar 19 '16 at 11:44

answered Mar 17 '16 at 22:43

Hadi Brais

22,259
3
54
95

@M.M Indeed. I think you're referring to the relative order of `a` and `b`. I meant implementation-dependent. I'll clarify. – Hadi Brais Mar 18 '16 at 00:33
p7 only treats the case of taking the pointer to the object itself. Not comparing an incremented version thereof. p6 is very clear about adjascent **arrays**, not scalars. (Btw: I corrected the flaw with the UB for comparing unrelated object pointers. It really was badly worded). – too honest for this site Mar 18 '16 at 04:06
If the constraints are not met, comparing pointers can very well be UB. `p` does not point to any object. There is no "next `int`" after `a` in the abstract machine. – too honest for this site Mar 18 '16 at 05:12
3

@Olaf p6 says "Two pointers compare equal if and only if...". This wording is extremely important. The "if and only if" part says that if any of the mentioned constraints are met then the result of the pointer equality is TRUE and if none of the constraints are met then the result of the equality must be FALSE (they cannot compare equal). In addition, p6 covers all pointers including invalid pointers (see the wording of 7.20.1.4 for comparison of wording). Finally, p6 says no where that UB is possible (see 6.5.6p8 for comparison of wording). Therefore, comparing pointers using == never... – Hadi Brais Mar 18 '16 at 11:44
2

...results in UB. Please see 6.5.8p5 for when might this happen. – Hadi Brais Mar 18 '16 at 11:44
@Olaf Also by "next `int`" I mean "a pointer to one past the end of one array object". That's what `p` is according to the standard in the standard's words. However, because the location of `b` is unspecified (implementation-dependent), whether the constraints mentioned in p6 are met or not is implementation-dependent. If the linker placed `b` right after `a`, the constraint that says "both are pointers to one past the last element of the same array object" applies. Overall, no UB. – Hadi Brais Mar 18 '16 at 12:14
2

That "the results of check(p == &b) and check((uintptr_t)p == (uintptr_t)&b) are implementation-dependent" does not mean they are *unrelated*: Because the back-and-forth conversion int* -> void* -> uintptr_t -> void* -> int* is guaranteed to to result in the same int* value, the uintptr_t values cannot compare equal for unequal int* values, at least not if uintptr_t has no padding bits. (Back-converting the same uintptr_t value will necessarily result in the same int* value because no other information is present.) This means the OP describes a gcc bug. – Peter - Reinstate Monica Mar 18 '16 at 13:08
@PeterA.Schneider Yes, they might be related and yes it might be a bug (we'll have to know the version of gcc to get to the bottom of this). However, irrespective of whether there is a bug or not, the behavior of this program as far as the standard is concerned is NOT UB. – Hadi Brais Mar 18 '16 at 14:12
@PeterA.Schneider I would like to add that the conversion from `int*` to `uintptr_t` doesn't go through `void*`. The standard discusses this and my answer emphasizes it. – Hadi Brais Mar 18 '16 at 14:15
You are right, it didn't go through `void*` in the OP, but it should in order to be more meaningful (and my "they are related" was meant when doing the two-step conversion). Not that it should matter in practice... But we are deep in standard land. I have detailed this argument that this is a bug in my edited answer below. – Peter - Reinstate Monica Mar 18 '16 at 14:31
@HadiBrais: see 6.5.9p2! If these constraints are not met, you **do** very well invoke UB. – too honest for this site Mar 18 '16 at 15:07
1

You are right, I completely missed the part about one-length arrays (and it makes a lot of sense, if I take a few steps back). I've amended my answer at the places where you spotted the errors. – Theodoros Chatzigiannakis Mar 18 '16 at 22:04
@Olaf Yes. But in this case, the second constrain "both operands are pointers to qualified or unqualified versions of compatible types;" applies. Both operands `p` and `&b` are pointers to unqualified type `int`. Therefore, no UB. Note that no where it says that the pointers have to be valid. – Hadi Brais Mar 19 '16 at 00:32
@HadiBrais: Problem is, `p` neither points to a single object (p7) nor past or into an array (p6). – too honest for this site Mar 19 '16 at 02:58
@Olaf `p=&a+1` The pointer `&a` points to an object and by 6.5.6p7 it points to the first element of an array of length one. Now by 6.5.6p8, the expression `&a+1` is a pointer to one past the last element of the array. This is all carefully mentioned in my answer. In addition, even if `p` doesn't point to an object, 6.5.9p7 doesn't stipulate that the pointer must point to an object, it says if the pointer points to an object then it is treated as a pointer to an array. That's it. 6.5.9p6 does NOT say "Two valid pointers compare equal if and only if..." It says... – Hadi Brais Mar 19 '16 at 11:33
... "Two pointers compare equal if and only if..." That is, any two pointers. To put it all together, the constraints are only on the types of the pointers as specified in their declarations. But they can point to anything. This is different from other relational operators discussed in 6.5.8. – Hadi Brais Mar 19 '16 at 11:34
@HadiBrais: Not sure why you bring in 6.5.6 here. I never doubted the addition operation as such increments as expected. We talk about &p` here. Problem is 6.5.9p7, which **requires** a pointer pointing to a non-array object, not one past it. Interestingly, it very well applies to `&b`. – too honest for this site Mar 19 '16 at 15:57
@Olaf Exactly right. 6.5.9p7 doesn't apply to `p` which is according to the standard one past the last element of the array. Also 6.5.9p7 applies to `&b`. Therefore, the only way `p == &b` to be TRUE is when the last condition in 6.5.9p6 is satisfied. That is: one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. – Hadi Brais Mar 19 '16 at 16:47
@HadiBrais: Problem is `a` is **not** an array, but a scalar. My problem is actually not the optimised code, but the not optimised. But I very well see the practical problems. I suspect the standard should just make such situations unspecified behaviour to allow for optimisations. That would also be logical. Maybe I/we just missread the constraints. – too honest for this site Mar 19 '16 at 16:56
@Olaf That's why I mentioned 6.5.6p7 to you. It says that `a` is to be treated as if it was an array. And by 6.5.6p8, `p` is one past the last element of the array. – Hadi Brais Mar 19 '16 at 17:01
@HadiBrais: Again: 6.5.6 is about the addition operators! It describes hot such a pointer is to be treated for addition/subtraction. Once that operation is done, you have another pointer - done. For the comparison of **that** resulting pointer's value 6.5.9 **only** applies. – too honest for this site Mar 19 '16 at 17:03
@Olaf This is a minor disagreement between us (it would be great if someone can help us settle this). But the major disagreement is that you say in your answer that the program invokes UB. I think you are referring to pointer comparison. I'm trying to explain that pointer comparison using == never results in UB as long as one of the constraints in 6.5.9p2 is met which is the case here. It doesn't matter what these pointers are pointing to. – Hadi Brais Mar 19 '16 at 17:12
@Olaf Yes. It says at the end "Please don't try "solving" this by disabling optimisation or other hacks, but fix it by removing the UB. It is a bug in the code." – Hadi Brais Mar 19 '16 at 17:19
@HadiBrais: Uh, I have overlooked that during my edits. Anyway, changed. – too honest for this site Mar 19 '16 at 17:30

M.M · Answer 4 · 2016-04-26T00:35:34.650

2

According to C11 6.5.9/6 and C11 6.5.9/7, the test p == &b must give 1 if a and b are adjacent in the address space.

Your example shows that GCC appears to not fulfill this requirement of the Standard.

Update 26/Apr/2016: My original answer contained suggestions about modifying the code to remove other potential sources of UB and isolate this one condition.

However, it's since come to light that the issues raised by this thread are under review - N2012.

One of their recommendations is that p == &b should be unspecified, and they acknowledge that GCC does in fact not implement the ISO C11 requirement.

So I have the remaining text from my answer, as it is no longer necessary to prove a "compiler bug", since the non-conformance (whether you want to call it a bug or not) has been established.

edited Apr 26 '16 at 00:35

answered Mar 18 '16 at 00:09

M.M

138,810
21
208
365

Using a language extension doesn't make the program to have undefined behavior. It just makes it NOT strictly conforming (call it loosely conforming). See appendix J and section 4. – Hadi Brais Mar 18 '16 at 00:49
@HadiBrais J.4 in C11 is "Locale-specific behaviour", I guess you mean J.5 "Common extensions", but nevertheless, the standard cannot define the behaviour of any such extension. – M.M Mar 18 '16 at 00:53
1

Yes but that doesn't make it UB. It's OK and reliable to use a language extension. – Hadi Brais Mar 18 '16 at 00:59
Anything not defined is undefined. For example no conclusions can be drawn about OP's program because the meaning of `__attribute__((section("test"))` could involve changes to the results of pointer comparisons involving that pointer. (Whether or not the compiler documents as much). – M.M Mar 18 '16 at 01:01
"The test p == &b must give 1 if a and b are adjacent in the address space" - By which criterion of the cited paragraphs? They both point to different objects. p7 clearly only refers to the object itself, i.e. `&a`, but not past the object (`&a + 1`). That is only allowed for an array. Additionally, The pointers are to different objects. Assuming `a` was an array of length `1`, `p` would still be related to object `a`, not `b`. Thus comparing the pointers is undefined behaviour. – too honest for this site Mar 18 '16 at 03:31
2

@Olaf there is no rule that using equality operators (6.5.9) with pointers to different objects is undefined behaviour. You are thinking of relational operators (6.5.8). "By which criterion" - the sentence in 6.5.9/6 ending with "that happens to immediately follow the first array object in the address space". "not one past the object" - p7 clearly says that the object behaves like the first element of an array of length 1, and 6.5.6/8 covers adding `1` to the address of the first element of an array of length 1 (or any other length). – M.M Mar 18 '16 at 04:31
@M.M: Yes, sorry, I'm lacking the proof for UB in the comment. See the comment at my own answer. I still suspect UB, but not from these paragraphs. It is a weaker than weak argument, but I have a feeling gcc is correctly exploiting UB here; just not sure how to prove it with the standard. Feel fre to drop me a comment if you find something enlighting in either direction. – too honest for this site Mar 18 '16 at 05:10
Can you explain how, if uintptr_t is big enough, the conversion to and comparison of uintptr_t can be *true* if the pointer comparison isn't? I could make a theoretical (but insane) case for unused bits in uintptr_t which make such a comparison *wrong*... – Peter - Reinstate Monica Mar 18 '16 at 05:43
@PeterA.Schneider see Olaf or Hadi's answers for discussion of uintptr_t. Also note that `uintptr_t` might not even exist – M.M Mar 18 '16 at 06:31
@M.M I have read that; I understand that the language does not make assumptions about the representation of pointers as uintptr_t and consequently about the result of operations on such values. But even if an implementation would in practice leave bits uninitialized (unlikely) it could lead to a *false negative* equality, not to a *"false positive"* as in the OP's example. The same uintptr_t value would necessarily convert back to the same void pointer, which would violate the "Back and forth preservation guarantee" the standard makes if they were originally different. (ctd.) – Peter - Reinstate Monica Mar 18 '16 at 08:31
... The same connection can be made for `int *` to `void *` and back. I understand that the language possibly doesn't make a guarantee that after conversion to `void *` to formerly unequal pointers still *compare* unequal. But there still must be a (possibly hidden) difference because they must covert back differently, and the difference must be carried through a conversion to uintptr_t because it must also be reversable preserving the original inequality. Since we are comparing integers with uintptr_t, equality is well defined: the bits are equal. There is no room for hidden unequal bits.) – Peter - Reinstate Monica Mar 18 '16 at 08:38
1

As a clarification: The equality of the two uintptr_t is actually *not* false positive; as the OP says, the compiler (wrongly, as we have established) just doesn't perform the actual pointer comparison and emits false there with optimization on. The uintptr_t comparison is proper (that's why I put the "false positive" in double quotes. It's true positive). – Peter - Reinstate Monica Mar 18 '16 at 08:42
@PeterA.Schneider I mean, I don't want to discuss uintptr_t in these comments as it is mainly addressed by the other answers, so to keep things tidy discuss it there; my answer focuses on just the `p == &b` test – M.M Mar 18 '16 at 08:49
@M.M Regarding language extensions, you're missing the point here. Locale-specific behavior (3.4.2) is separate and different from undefined behavior (3.4.3). In fact, one of the main goals of introducing locale-specific behavior in the standard is to support language extensions without triggering UB. Section 3.4.2 clearly states that the behavior must be documented by the implementor. – Hadi Brais Mar 19 '16 at 00:47
@HadiBrais: One point of sloppiness in the Standard is that it does not make clear in what cases an implementations's specification of behavior in one situation where the Standard would otherwise impose no requirements implicitly defines behavior in related situations. I don't think gcc's optimization here is legitimate, since even if no means existed by which a programmer could deliberately cause variables to be adjacent, gcc has no reason to believe that they couldn't be adjacent by chance, and a programmer would be entitled to expect the behavior of p==&b to be consistent. – supercat Apr 13 '16 at 19:18
@HadiBrais: If e.g. one of the variables were an `extern` whose address was in the middle of the stack, I think gcc might be somewhat more entitled to assume that it can't equal the address of an automatic variable if neither that variable nor the one immediately below has ever had its address taken, on the basis that the pointer in question cannot possibly be a valid pointer to a live object, and the compiler is under no obligation to treat consistently any comparisons between pointers to dead and live objects. – supercat Apr 13 '16 at 19:21
@supercat Agreed. That's why I think all of the answers including my answer are incomplete. They fail to explain exactly why gcc is exhibiting this apparently unjustified inconsistency. It could be a bug in gcc. – Hadi Brais Apr 13 '16 at 20:04
@HadiBrais my answer lays out a route to removing some possible confounding factors and establishing that it is a gcc bug – M.M Apr 13 '16 at 23:42
1

@HadiBrais it seems that this is intentional non-compliance by gcc, and there is a paper recommending that the standard be changed – M.M Apr 26 '16 at 00:42

Peter - Reinstate Monica · Answer 5 · 2016-03-18T13:48:26.340

1

Re-reading your program I see that you are (understandably) baffled by the fact that in the optimized version

p == &b

is false, while

(uintptr_t)p == (uintptr_t)&b;

is true. The last line indicates that the numerical values are indeed identical; how can p == &b then be false??

I must admit that I have no idea. I am convinced that it is a gcc bug.

After a discussion with M.M I think I can make the following case if the conversion to uintptr_t goes through an intermediate void pointer (you should include that in your program and see whether it changes anything):

Because both steps in the conversion chain int* -> void* -> uintptr_t are guaranteed to be reversible, unequal int pointers can logically not result in equal uintptr_t values.¹ (Those equal uintptr_t values would have to convert back to equal int pointers, altering at least one of them and thus violating the value-preserving conversion rule.) In code (I'm not aiming for equality here, just demonstrating the conversions and comparisons):

int a,b, *ap=&a, *bp = &b;

assert(ap != bp);

void *avp = ap, *bvp bp;

uintptr_t ua = (uintptr_t)avp, ub = (uintptr_t)bvp;

// Now the following holds:
// if ap != bp then *necessarily* ua != ub. 
// This is violated by the OP's case (sans the void* step).

assert((int *)(void *)ua == (int*)(void*)ub);

¹This assumes that the uintptr_t doesn't carry hidden information in the form of padding bits which are not evaluated in an arithmetic comparison but possibly in a type conversion. One can check that through CHAR_BIT, UINTPTR_MAX, sizeof(uintptr_t) and some bit fiddling.—
For a similar reason it's conceivable that two uintptr_t values compare different but convert back to the same pointer (namely if there are bits in uintptr_t not used for storing a pointer value, and the conversion does not zero them). But that is the opposite of the OP's problem.

edited Mar 18 '16 at 13:48

answered Mar 16 '16 at 13:36

Peter - Reinstate Monica

15,048
4
37
62

The language make actually an anti-guarantee. – too honest for this site Mar 16 '16 at 13:51
@Olaf Well, "undefined" doesn't *guarantee* doom. It just doesn't make *any* guarantee. – Peter - Reinstate Monica Mar 16 '16 at 13:52
Well, that's what I mean with "anti-guarantee": It guarantees if you rely on it, you are outside the standard. Prepare for nasal daemons. – too honest for this site Mar 16 '16 at 13:56
1

There is no undefined behaviour in `p == &b` . Conceivably there is undefined behaviour in the cast to `uintptr_t` although that seems unlikely. – M.M Mar 17 '16 at 23:59
@PeterA.Schneider: gcc is well within the standard when agressively optimising. Actually I prefer such a compiler to one which pads every statement in soft cushions before optimizing. It might not be beginner-friendly, but that is not gcc's fault, but the C language itself. Luckily other compilers (notably from the embedded field) seem to start following gcc . Also note that gcc provides a large set of disagnostics to report quite some potential problems (but not all). – too honest for this site Mar 18 '16 at 04:00
@M.M. Thanks for pointing that out. Interesting -- the other relational operators are undefined. Any idea why that is so? Btw, a strict reading of the standard would make it undefined to test &b-1 == &a, even if &a+1 == &b (you can peek behind an array, but not in front...) – Peter - Reinstate Monica Mar 18 '16 at 05:24
1

`a < b` is difficult to test on a segmented architecture, or something. There might not even be an absolute ordering of all possible pointers. – M.M Mar 18 '16 at 06:28
@M.M Difficult: Hm (byte by byte? How do you test for equality?). No Ordering: Sure; but that makes it unspecified (the standard could explicitly void any of the usual guarantees), or implementation defined, but won't crash. – Peter - Reinstate Monica Mar 18 '16 at 08:53
If the number of possible values for a uint64_t exceeds the number of possible different pointers, there's no requirement that p==q implies (uintptr_t)(void*)p == (uintptr_t)(void*)q. If an implementation never uses bit 57 of a pointer, for example, it would be legal for an implementation to arbitrarily store a zero or one when converting that pointer to a uintptr_t, provided that the value of that bit in the uint64_t is ignored when converting back to a pointer. – supercat Apr 13 '16 at 19:07
@supercat I said that much (equal pointers can conceivably convert to unequal `uint64_t` values) in the last paragraph of my post. Note that that is the opposite of the OP's problem: *His* integers compare already equal, but the pointers don't. – Peter - Reinstate Monica Apr 14 '16 at 07:38
@PeterA.Schneider: Would an implementation be under any obligation to ensure that pointer-to-integer conversions of distinct pointers yield unique numbers if code never actually converted any of the numbers back to pointers? I would expect any implementation that wasn't being deliberately obtuse should do so, but if code never converts any numbers to pointers would there be anything illegitimate about a compiler making all pointer-to-uintptr_t conversions yield 42? – supercat Apr 14 '16 at 15:14
@supercat That would be actively malicious. – Peter - Reinstate Monica Apr 14 '16 at 15:23
@PeterA.Schneider: Of course it would be deliberately obtuse, but probably no more harmful than a lot of other deliberately-obtuse optimizations that have become fashionable, like assuming that `void set_float_bits(float *fp, unsigned u) { (int*)fp=u; }` won't modify any object of type `float` [note that `memcpy(&fp, &u, sizeof f);` will often yield inferior performance on almost any compiler that can't figure out where `fp` came from, since it the memcpy version could legitimately alias anything, rather than being limited to `float` or integer types.] – supercat Apr 14 '16 at 15:32
@supercat Breaking the aliasing rules is explicitly listed as UB; converting values isn't. It is not even indeterminate. The conversion chain pointer -> void pointer -> integer and back has clearly defined semantics layed out in the relevant sections of the standard. Just because somebody doesn't go all the way doesn't void the intention. That's a big difference to the aliasing issues. – Peter - Reinstate Monica Apr 14 '16 at 15:43
@PeterA.Schneider: The stated purpose of the aliasing rules was to avoid requiring compilers to treat every pointer of unknown origin as poitentially aliasing everything whose address has been exposed to outside code; it describes cases where a compiler must recognize aliasing *even though the compiler would have no reason to expect it*. The rule is horribly written [the widespread confusion over what it means is prima facie evidence of that] but I see no reason to believe it was ever intended to allow a non-obtuse compiler writer to ignore aliasing in obvious cases like the above. – supercat Apr 14 '16 at 15:56
@PeterA.Schneider: It would have been perfectly easy in 1990 for compiler writers refrain from flushing register-cached `float` values before executing a statement which casts a `float*` to an `int*` and dereferences it; had they done so, there would have been immediate outrage and the rules would be recognized as horribly deficient. Instead, compiler writers used common sense to recognize allowable usage patterns beyond those mandated by the Standard, and it is only because of hyper-modernist revisionism that sensible code needs to be replaced with code that is harder to read and slower. – supercat Apr 14 '16 at 16:05

Why does the compiler assume that these seemingly equal pointers differ?

5 Answers5

Linked