3
#include <stdio.h>

int main(void)
{
    int a, b;
    int *p = &a;

#ifdef __clang__
    int *q = &b + 1;
#elif __GNUC__
    int *q = &b - 1;
#endif

    printf("%p %p %d\n", (void *)p, (void *)q, p == q);
}

C11 § 6.5.9 \ 6 says that

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

I have tested it four different ways:

  1. Clang 9.0.1 with -01 option;
  2. Clang 9.0.1 without any options;
  3. GCC 9.2.0 with -01 option;
  4. GCC 9.2.9 without any options.

The results are the following:

$ ./prog_clang
0x7ffebf0a65d4 0x7ffebf0a65d4 1
$ ./prog_clang_01
0x7ffd9931b9bc 0x7ffd9931b9bc 1
$ ./prog_gcc
0x7ffea055a980 0x7ffea055a980 1
$ ./prog_gcc_01
0x7fffd5fa5490 0x7fffd5fa5490 0

What is the correct behavior in this case?

eanmos
  • 387
  • 1
  • 6
  • 15
  • 4
    There is no correct behavior. `a` and `b` are unrelated. – Mat Jan 16 '20 at 21:24
  • 5
    Making assumptions about stack layout sends you off into undefined territory. `a` and `b` have no well defined relationship in terms of the language semantics, so you kinda get what you get. The "correct behavior" in this case is to avoid writing code like this =D – Ben Zotto Jan 16 '20 at 21:24
  • @BenZotto: That's clear intuitively, but how does it follow from the language in the standard? – Nate Eldredge Jan 16 '20 at 21:28
  • @NateEldredge: I expanded into an answer. See also AndrewHenle's answer which comes at it from another part of the standard. – Ben Zotto Jan 16 '20 at 21:39
  • It precisely follows the standard. `a` and `b` are NOT pointers to the same object. `a` is an object separate and apart from `b`. Neither `a` or `b` are *arrays* and neither are `NULL`.. – David C. Rankin Jan 16 '20 at 21:51
  • Setting aside the other stuff, the real mystery for me here is why the last test actually produced `0` when the numerical values of the addresses appear to be indeed identical per your logging output. That suggests the compiler (with only a different optimization mode?) is basing that output on actual semantic analysis and ignoring the underlying value compare. That's a new one for me, anyway. – Ben Zotto Jan 16 '20 at 21:56
  • See the linked dup. It's not UB, but GCC has the concept of pointer provenance so there's no guarantee the pointers will compare equal even if the numerical values are the same. – dbush Jan 16 '20 at 21:58
  • @dbush, thanks. Sorry for a duplicate. – eanmos Jan 16 '20 at 21:59
  • @DavidC.Rankin: They are both arrays in this situation. C 2018 6.5.9 (“Equality operators”) 7 says “For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.” – Eric Postpischil Jan 16 '20 at 22:45
  • 1
    `&b - 1` is not defined by the C standard, because it defines what happens with pointer arithmetic within or just after an object but not what happens if you subtract to point before an object. However, GCC may return false if the code is changed to compare `&a+1` with `&b` even though the addresses it reveals for `a` and `b` show `b` is in fact just beyond `a`.… – Eric Postpischil Jan 16 '20 at 22:51
  • … In this regard, GCC violates the C standard, which says that “Two pointers compare equal **if and only if** … one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.” (The pointers are treated as pointers to elements in arrays per C 2018 6.5.9 7, quoted in my earlier comment.) – Eric Postpischil Jan 16 '20 at 22:52
  • @EricPostpischil - I did not put those two together. In that case, then it almost reads like it would be a valid-use case if both `p` and `q` are considered pointers to the first element in an array and they both point to the same element. But that seems to somewhat contradict the section cited in the question. – David C. Rankin Jan 16 '20 at 22:56
  • @BenZotton Here the compiler makes assumptions, not the user (except if you nitpick on the way the code is written...) – curiousguy Jan 17 '20 at 04:16
  • @dbush "_GCC has the concept of pointer provenance so there's no guarantee the pointers will compare equal even if the numerical values are the same_" exactly but there is exactly nothing in the std that justifies that behavior – curiousguy Jan 17 '20 at 04:22

2 Answers2

6

What is the correct behavior in this case?

There is none. Comparing pointers to or one past the end of two completely unrelated objects is undefined behavior.

Per footnote 109 of the C11 standard (bolding is mine):

Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • 1
    If I understand the standard paragraph correctly it is only *undefined* behavior here because of the pointer subtraction. It would be merely *unspecified* in the case of comparing `q` (one-past-`b`) to e.g. `&a`. – walnut Jan 16 '20 at 21:42
  • @walnut There's a whole discussion about comparing pointers on this question and its answers: https://stackoverflow.com/questions/45966762/can-an-equality-comparison-of-unrelated-pointers-evaluate-to-true Be sure to read the links to GCC bug reports. – Andrew Henle Jan 16 '20 at 21:59
  • I haven't yet read all of the comments in that question, but the answers seem to agree that it is not undefined behavior. The issue there seems to be whether comparison is required to be consistent. – walnut Jan 16 '20 at 22:05
  • @walnut I'll admit it's a bit of a gray area, but to me it's not possible to get inconsistent results without undefined behavior. Imagine certain architectures that have different pointer types for different object types. Maybe then a `double *` would have the exact same value and compare equal to an `int *`? An architecture doesn't have to be the single flat address space of today's POSIX and Windows systems. – Andrew Henle Jan 16 '20 at 22:15
  • I agree with that. My only gripe is with the word "*undefined behavior*" which I think should be "*unspecified behavior*" here. "*undefined behavior*" would imply that not only does it result in an unspecified value, but also that the standard won't impose any requirement on the program at all. – walnut Jan 16 '20 at 22:20
  • 1
    @walnut, if the language specification omits any definition of the behavior of some construct or expression then the behavior is undefined, both in the general English usage sense and in the specific C-language-domain sense. The standard specifically says so. Consequently, unspecified behavior, in the C-language-domain sense, happens only where the standard explicitly says it does. – John Bollinger Jan 16 '20 at 22:26
  • @JohnBollinger But I don't see how the standard is *omitting* a definition here. It says that the pointer compare equal *"if and only if"* one of the listed cases apply. This does not seem to leave any room for omitted definition to me. The relevant listed item for this case here says that it depends on memory layout, which is unspecified in itself. Why would this particular item be listed if it was supposed to be undefined behavior? Please correct me if I got something wrong here. (Of course the case where OP subtracts one from the pointer is clearly UB because of the invalid subtraction.) – walnut Jan 16 '20 at 22:33
  • 1
    @walnut, no, in standardese, the relevant details of the memory layout are *undefined*, not unspecified. In a case such as the OP's, where the value of an equality expression depends on the (undefined) relative layout of two unrelated objects, the value of that expression is undefined. – John Bollinger Jan 16 '20 at 23:11
  • @JohnBollinger So you would say that both variants of OP's program have undefined behavior? (The one for Clang and the one for GCC.) What do you make of the answers in the linked duplicate and by #EricPostpischil in the comments of this question, which seem to contradict that interpretation for the Clang version (which only uses the one-past-the-object pointer)? – walnut Jan 16 '20 at 23:27
  • 1
    @walnut, yes, I would say that both variants have undefined behavior, and therefore I disagree with Eric's conclusion that GCC is non-conforming in this regard. I also disagree with the impressive array of answers on the dupe. They all follow the common line of reasoning that one can determine *ex post facto* from the two pointer values that one object immediately follows the other in memory, so as to judge what the result of equality comparison should be, but that argument is not actually supported by the standard. In fact, it's circular. – John Bollinger Jan 16 '20 at 23:42
  • 2
    "_Comparing pointers to or one past the end of two completely unrelated objects is undefined behavior_" This is obviously 200% wrong. Wrong on the letter and wrong on the spirit of C. – curiousguy Jan 17 '20 at 04:21
  • @curiousguy Read John Bollinger's comments above. There is no portion of any version of the C standard that defines the relationship between pointers to two completely separate objects. – Andrew Henle Jan 17 '20 at 12:28
  • It is possible to get inconsistent results without undefined behavior. Proof: [GCC defines pointer-to-integer conversion to preserve the pointer bits](https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html#Arrays-and-pointers-implementation). `&a+1` is defined by C 2018 6.5.6 8. `(intptr_t) (&a+1)` and `(intptr_t) &b` are defined by the GCC definition of pointer-to-integer conversion. `&a+1 == &b` is defined by 6.5.9 6 and 7. Then if `(intptr_t) (&a+1) == (intptr_t) &b` and `&a+1 == &b` yield different results, there is inconsistency in defined behavior. – Eric Postpischil Jan 17 '20 at 13:08
  • (GCC may track pointer provenance through the conversion to `intptr_t`, but the inconsistency may also be detected by foiling that provenance tracking by means such as `volatile`, `memcmp`, and passing the values to external functions.) – Eric Postpischil Jan 17 '20 at 13:10
  • 2
    The sentence “Comparing pointers to or one past the end of two completely unrelated objects is undefined behavior” is clearly false. Comparison of such pointers is undefined for the relational operators, per C 2018 6.5.8 5. But comparison with the equality operators is defined. 6.5.9 gives an “if and only if” that defines the result for **every** case of two pointers, and nothing in 6.5.9 presents a constraint on whether they point to the same object. – Eric Postpischil Jan 17 '20 at 13:13
  • @AndrewHenle "_There is no portion of any version of the C standard that defines the relationship between pointers to two completely separate objects_" That has no relevance what so ever. Again, the context was: "_Comparing pointers to or one past the end of two completely unrelated objects is undefined behavior_" – curiousguy Jan 17 '20 at 15:57
5

Two pointers compare equal if and only if both are null pointers,

they are not null

both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function

they do not point to the same object, nor a subobject, nor a function

both are pointers to one past the last element of the same array object,

they are not pointers to array elements.

or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

they are not pointers to array elements.


So, according to the standard, your pointers do not meet the requirements for comparing as equal, and should have never compared as equal.

Now, in your tests, in the first three cases, the pointers did in fact compare as equal. One can say that the compilers do not strictly adhere to the standard, because the standard says "if and only if", but as you have seen, clang and gcc without -O1 behave as if the standard said "if" without the "and only if" part. The compilers simply do not try to take extra measures to ensure that the "and only if" part is respected, so they allow the pointers to compare as equal, as a matter of pure coincidence, despite the fact that according to the standard, they shouldn't.

Since it was pure coincidence, in the last case the coincidence does not hold true anymore, due to a number of unknown reasons having to do with the compiler's implementation of optimizations. The compiler may have decided to reverse the order of the variables on the stack, or to put them farther away from each other, or who knows what.

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • Although I would note that even if they were arrays it still wouldn't be defined. It has to be one array in which two elements are sequential. Also it's not "pointers to array", it's "pointers to elements in an array". – S.S. Anne Jan 16 '20 at 21:42
  • 1
    But, the standard says "if and only if". So since none of the conditions are met, a literal reading of the standard would say that the pointers must not compare equal. – Nate Eldredge Jan 16 '20 at 21:43
  • The next sentence in the standard after the quoted one specifies that a non-array object is treated as an array of size one for the purpose of that section. So the last quoted part *does* apply. – walnut Jan 16 '20 at 21:46
  • 2
    @walnut: `a` and `b` can indeed be considered as one-element arrays. Still doesn't allow you to compare the result of pointer arithmetic on `&a` and `&b` - they are different one-element arrays. – Mat Jan 16 '20 at 21:56
  • 2
    @Mat But that is what the last quote in this answer specifies. If they point to different arrays (or objects treated as arrays), one of them past-the-end and one to the start, then the result is defined and either `true` or `false` depending on how the memory layout happens to be chosen. – walnut Jan 16 '20 at 21:58
  • @walnut: yes, precisely. If the second "happens to be" immediatly following one-past the end for the first, then they compare equal (in C at least, I'm sure C++ has such wording). But where in the standard can you find specified that `a` is immediately following `b` in memory (or the opposite)? – Mat Jan 16 '20 at 22:05
  • @Mat In my interpretation it was *unspecified* whether `a` immediately follows `b` in address space or it doesn't, making the result of the equality operator *unspecified* as well, but not *undefined*, by the quoted paragraph. This seems to be the majority opinion based on the answers in the linked duplicate, but there seems to be disagreement saying that the layout is *undefined* and therefore that the equality comparison is *undefined behavior* as well. See my discussion with #JohnBollinger on the other answer. I don't know which interpretation is correct. – walnut Jan 16 '20 at 23:58
  • (My comments apply only to the clang variant of OP's code. The GCC variant clearly has UB because of the pointer subtraction.) – walnut Jan 17 '20 at 00:01
  • "_they do not point to the same object, nor a subobject, nor a function_" you can't prove that – curiousguy Jan 17 '20 at 04:21
  • 2
    This answer is wrong; the pointers do point to array elements for the purposes of the `==` operator. C 2018 6.5.9 7 says “For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.” – Eric Postpischil Jan 17 '20 at 12:57
  • @NateEldredge "_a literal reading of the standard would say that the pointers must not compare equal_" Only if you assume that a ptr to an obj is never one past another. – curiousguy Jan 17 '20 at 23:49
  • @curiousguy: Well, that clause only applies if the pointers are pointers to *array* objects, and this answer claims that they are not. But as Eric points out, the answer is incorrect in that regard. – Nate Eldredge Jan 18 '20 at 03:46