4

I am trying to understand pointer comparison operators in c programs.

ISO/IEC 9899:2011 specifies that comparing pointers (using > or <) pointing to different objects is undefined behavior.

However, playing around I figured that when "irrelevant" pointers are compared, they seem to be treated as just "numbers that happen to represent a location in memory", by all tested compilers/interpreters.
Is this always the case? If so, why isn't this part of the standard?

To put this differently, can there be an edge case where pointer p points to virtual memory address of let's say 0xffff, pointer b to 0x0000, yet (p < b) returns true?

oxynoia
  • 61
  • 4

4 Answers4

3

Note that "undefined behaviour" does not mean "will crash" or "will do bad stuff." It means "there is no definition of what will happen; literally anything is allowed to happen." And when optimisations get into the picture, literally anything can actually happen, too.

Regarding your observation: you've probably tested this on x86 or x86_64 architecture. On those, it's still likely that you will get the behaviour you've observed (even though it's technically undefined). However, keep in mind that the C specification is intended to work on all platforms and architectures where C can be used, including exotic embedded platforms, specialised hardware etc. On such platforms, I'd be much less certain of the results of such pointer comparisons.

Angew is no longer proud of SO
  • 167,307
  • 17
  • 350
  • 455
  • Re: “it's almost guaranteed that for any sane compiler, you will get the behaviour you've observed”: No, that is not correct. GCC and Clang have getting more aggressive about pointer provenance: They take note of how pointers are derived and include that information in optimization or just generally how they encode the semantics of the program. The result is that these pointers comparisons do not work simply as if they compared addresses. – Eric Postpischil Oct 10 '19 at 17:37
  • @EricPostpischil Thanks; I've weakened the claim. – Angew is no longer proud of SO Oct 11 '19 at 08:26
  • @EricPostpischil: I don't think the statements "Any sane compiler will process construct X reasonably" contradicts "GCC and clang process construct X nonsensically". The authors of the Standard follow the principle that the more absurd something would be, the less need there should be to prohibit it. The authors of the Standard made no attempt to accommodate all the situations where compilers would have to, at least from a late 1980s point of view, go absurdly far outside of their way not to behave usefully. In many cases the Standard left actions as "Undefined Behavior"... – supercat Apr 06 '21 at 15:02
  • ...not because implementations weren't expected to process them meaningfully, but rather because *the authors of the Standard saw no reason to imagine that implementations might do otherwise* in the absence of a mandate. – supercat Apr 06 '21 at 15:04
3

Is this always the case? If so, why isn't this part of the standard?

Most of the time, but not necessarily. There's various oddball architectures with segmented memory areas. The C standard also wants to allow pointers to be some abstract items, that are not necessarily equivalent to physical addresses.

Also, in theory if you have something like this

int a;
int b;
int* pa = &a;
int* pb = &b;

if (pa < pb) // undefined behavior
    puts("less"); 
else 
    puts("more");

Then the compiler could in theory replace the whole if-else with puts("more"), even if the address of pa is lower than the address of pb. Because it is free to deduct that pa and pb cannot be compared, or that comparing them always gives false. This is the danger of undefined behavior - what code the compiler generates is anyone's guess.

In practice, the undefined behavior in the above snippet seems to lead to less efficient code, on -O3 with gcc and clang x86. It compiles into two loads of the addresses and then a run-time comparison. Even though the compiler should be able to calculate all addresses at compile time.

When changing the code to well-defined behavior:

int a[2];
int* pa = &a[0];
int* pb = &a[1];

Then I get much better machine code - the comparison is now calculated at compile time and the whole program is replaced by a simple call to puts("less").

On embedded systems compilers however, you are almost certainly able to access any address as if it was an integer - as a well-defined non-standard extension. Otherwise it would be impossible to code things like flash drivers, bootloaders, CRC memory checks etc.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Thanks for the answer. You mentioned segmented memory could lead to unexpected results. How about paging? Could it be that the compiler compares offsets, and thus lead to unexpected results too? – oxynoia Oct 10 '19 at 14:34
  • @oxynoia Paging is just another name for segmented memory. It's a perfect example of when this can go wrong really: on the average low end MCU with paging, pointers will be 16 bit. Comparing pointers without including the page address might give you any random result. The a page address won't be compared unless you use `* far` qualifier or such, giving you a 24 bit pointer. – Lundin Oct 10 '19 at 15:03
0

Is this always the case?

Most of the time, and on popular architectures with "flat" memory spaces. (Or at least, this used to be the case. As a comment reminds me, this is yet another example of the sort of thing that used to be undefined-but-you-could-probably-get-away-with-it, but is migrating towards undefined-and-don't-touch-it-with-a-ten-foot-pole.)

If so, why isn't this part of the standard?

Because it's absolutely not true all of the time, and C has never been interested in limiting itself to one set of architectures in that sort of way.

In particular, "segmented" memory architectures were once very, very popular (think MS-DOS), and depending on the memory model you used, heterogeneous pointer comparisons definitely didn't work.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • 1
    The fact that an architecture has a flat memory address space does not mean that pointer comparison between unrelated objects will work. Compiler optimization may break it. – Eric Postpischil Oct 10 '19 at 12:33
  • @EricPostpischil: Indeed, compiler optimization can break even equality comparisons between `restrict` pointers and other pointers (as would be useful, on compilers that support them, in cases where a function would receive pointers to the start and end of a region of memory, where the latter is used for no purpose except to identify when loops involving a pointer based upon the former should stop). I don't think the authors of the Standard intended to forbid such comparisons, but the way clang and gcc interpret the Standard breaks them. – supercat Apr 06 '21 at 14:53
0

Is this always the case?

No. There's no guarantee that separate objects will be laid out in any particular order. There's no guarantee that all objects occupy the same memory segment.

If so, why isn't this part of the standard?

See above.

"Undefined behavior" means exactly this:

3.4.3
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

3 EXAMPLE An example of undefined behavior is the behavior on integer overflow

C 2011 online draft

In plain English, neither the compiler nor the runtime environment are required to handle the situation in any particular way, and the result could quite literally be anything. Your code could crash immediately. You could enter a bad state such that your program crashes elsewhere (those issues are fun to debug, let me tell you). You could corrupt other data. Or your code could appear to run just fine and have no obvious bad effects, which is the worst possible outcome.

Community
  • 1
  • 1
John Bode
  • 119,563
  • 19
  • 122
  • 198