0

Technically, subtracting a null pointer is undefined behaviour in C. Clang 13 issues a warning for it.

Yet this construct is used anyway, usually to determine the alignment of a pointer. For example, BSD-derived implementations of qsort use it. See here (OpenBSD) and an explanation of what it's for:

Snippet of a code sample with null pointer subtraction from OpenBSD. Please see the link above for full context.

#define TYPE_ALIGNED(TYPE, a, es)           \
    (((char *)a - (char *)0) % sizeof(TYPE) == 0 && es % sizeof(TYPE) == 0)

Question: Is such code safe to use on typical modern platforms (64-bit or 32-bit) with typical modern compilers? A lot of prominent production code seemed to have used this construct for many years.


I notice that code like this was removed from FreeBSD's qsort (see revision 334928), because GCC miscompiled some of it. However, I do not understand all the details in the discussion of the issue, and I cannot tell if the problem was a direct consequence of the null pointer subtraction. However, their proposed fix essentially eliminates the null pointer subtraction. I would appreciate some clarifications on the topic.

Szabolcs
  • 24,728
  • 9
  • 85
  • 174
  • Please provide a relevant code that demonstrates the problem instead of providing references. – Vlad from Moscow Nov 01 '21 at 10:37
  • @VladfromMoscow There are several links to relevant code, including links to StackOverflow. – Szabolcs Nov 01 '21 at 10:38
  • 1
    From the description it is unclear what is the problem. – Vlad from Moscow Nov 01 '21 at 10:40
  • 1
    You wrote "Is such code safe to use..." What code? Where is there the code in your question? – Vlad from Moscow Nov 01 '21 at 10:45
  • In your link to the discussion of the issue, I do not see discussion of the subtraction of `(char *) 0`, at least not directly. Is in something linked to there? Which specific text has discussion of that subtraction and do you have a question about? Some text there refers to the code as violating the aliasing rules, which is not how I would characterize `(char *) a - (char *) 0`. – Eric Postpischil Nov 01 '21 at 10:51
  • @EricPostpischil See [this comment](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201#c17) which proposes fixing the problem by eliminating the `(char *) a - (char *) 0` part. Does this answer your question? – Szabolcs Nov 01 '21 at 11:25
  • @VladfromMoscow I answered you in my comment. I linked code that illustrates what I am talking about. I am not sure what else to do. Do you have a specific suggestion for how to improve the question other than a generic "add code"? No, the question is not about one specific bit of code. It is about what might go wrong in practice when subtracting the null pointer from another pointer, for which Clang gives a warning. – Szabolcs Nov 01 '21 at 11:29
  • @EricPostpischil To elaborate more, various versions of this `qsort` code are found in many different projects. The GCC issue was about one of these `qsort` variants, found in the SPEC benchmark suite, being miscompiled by GCC under some conditions. The problem is said to be "aliasing violations", but this is reaching the limits of my understanding. Why is subtracting a null point an aliasing violation? Is it because we are doing arithmetic on two pointers "not part of the same array"? Or am I misunderstanding that this was the problem? – Szabolcs Nov 01 '21 at 11:35
  • @EricPostpischil (contd.) Their proposed fix does indeed eliminate the null pointer subtraction, and so does the fix in FreeBSD (which I linked to, and which referenced the GCC discussion). In the end I am simply looking for a better understanding of these issues so I can judge for myself what is or isn't safe to do in practical scenarios (regardless of what is "undefined behaviour" according to the standard—that's not always a problem, as it might be defined by all implementations which I need to care about). – Szabolcs Nov 01 '21 at 11:37
  • @Szabolcs: The comments do not say the aliasing violation is in the removed code. Changing the code as shown causes the expression only to ever evaluate to 2, never 0 or 1 as the original code did, and that may be used in some other code to select which type or other source code to use in whatever swap operation it is referring to, so the result may be that some source code that contained an aliasing violation is never used after the change. – Eric Postpischil Nov 01 '21 at 11:53

1 Answers1

1

When the C Standard was written, many hardware platforms performed pointer arithmetic in such a way that adding zero to a null pointer would yield a null pointer with no side effects, and subtracting one null pointer from another would yield zero with no side effects. These behaviors were often useful, since they could eliminate the need for corner-case code when performing tasks involving N-byte chunks of storage, where N might be zero.

Even though many platforms could support the aforementioned corner cases without having to generate any extra machine code, it was hardly clear that all platforms would be able to do so (I don't know of any particular platforms that couldn't, but wouldn't be at all surprised if some such platforms existed). The Standard thus handled such situations the same way as it handles other situations where almost all implementations would process a construct in the same useful fashion, but it might be impractical for all to do so: categorized the action as "Undefined Behavior" but allow implementations to, as a form of "conforming language extension", process it in a manner consistent with the underlying execution environment.

There was never any doubt about how such constructs should be processed on commonplace platforms. The only doubt would have been whether implementations whose target platforms would require extra machine code to yield the commonplace semantics should generate such extra machine code, and classifying such constructs as UB would allow such decisions to be made by people who were working with such platforms, and would thus be better placed than the Committee to weigh the costs and benefits of supporting the commonplace behavior.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Let me summarize, to confirm my understanding: On most modern platform, there will be no problem with subtracting a null pointer, such as `int *p;`, `p - (int *) 0`. The problems mentioned in the GCC discussion I linked were not a direct consequence of null pointer subtraction, as pointed out in comments by @EricPostpischil. The reason why null pointer subtraction is "undefined behaviour" in the standard is that it couldn't be reasonably supported on some archaic platforms. – Szabolcs Nov 02 '21 at 09:50
  • @Szabolcs: That's the *reason* it's UB. On the other hand. The fact that the maintainers of the Gratuitously Clever Compiler treat the phrase "non-portable or erroneous" as meaning "non-portable, and therefore erroneous" means that the gcc compiler should be expected to be needlessly incompatible with code that would rely upon what the authors of the Standard called "popular extensions" such as this. – supercat Nov 02 '21 at 14:33