9

I noticed this warning from Clang:

warning: performing pointer arithmetic on a null pointer
has undefined behavior [-Wnull-pointer-arithmetic]

In details, it is this code which triggers this warning:

int *start = ((int*)0);
int *end = ((int*)0) + count;

The constant literal zero converted to any pointer type decays into the null pointer constant, which does not point to any contiguous area of memory but still has the type pointer to type needed to do pointer arithmetic.

Why would arithmetic on a null pointer be forbidden when doing the same on a non-null pointer obtained from an integer different than zero does not trigger any warning?

And more importantly, does the C standard explicitly forbid null pointer arithmetic?


Also, this code will not trigger the warning, but this is because the pointer is not evaluated at compile time:

int *start = ((int*)0);
int *end = start + count;

But a good way of avoiding the undefined behavior is to explicitly cast an integer value to the pointer:

int *end = (int *)(sizeof(int) * count);
explogx
  • 1,159
  • 13
  • 28
  • What are you trying to do ? – Ôrel Jan 17 '19 at 09:40
  • 1
    Nothing. I just noticed this warning from Clang and try to understand why it's here, and if it's justified in accordance with the C standard. – explogx Jan 17 '19 at 09:41
  • 1
    An integer may be converted to any pointer type. The warning isn't coming from the cast, it's coming from the null pointer arithmetic, test my code and change 0 to 1, you will see. – explogx Jan 17 '19 at 09:42
  • The solution is to cast both operands to `uintptr_t` from stdint.h. Then everything is well-defined. – Lundin Jan 17 '19 at 10:04
  • @Lundin no it is not. For example, GCC says that only casting from pointer to real object to integers and back is defined. GCC does not guarantee more than the absolute minimum required by the Standard. https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html – Language Lawyer Jan 17 '19 at 11:36
  • @LanguageLawyer Well... C17 6.3.2.3/6. The result is implementation-defined naturally, as the format of pointers is not specified by the standard. Other than that, the conversion and arithmetic will work just fine. – Lundin Jan 17 '19 at 12:35
  • @Lundin I've shown you a real implementation where what you suggest is not well-defined. Are you sure it is defined in MSVC or Clang? – Language Lawyer Jan 17 '19 at 12:46
  • @LanguageLawyer I think you misunderstood what I meant. The link says that you may not convert to an integer, do arithmetic, then convert back to a pointer. I did not suggest that either - I said that one should convert to `uintptr_t` and do the arithmetic. GCC merely says that they don't support something icky like this: `foo = (int[1]){...}; foo = (int*)((int)foo + 10);`, where the pointer is forced to point beyond the object. – Lundin Jan 17 '19 at 13:53
  • @LanguageLawyer GCC requires something not mentioned in the standard. The restriction that no arithmetic is performed on the integer frankly is ridiculous. – curiousguy Jan 17 '19 at 14:05

2 Answers2

17

The C standard does not allow it.

6.5.6 Additive operators (emphasis mine)

8 When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i-n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

For the purposes of the above, a pointer to a single object is considered as pointing into an array of 1 element.

Now, ((uint8_t*)0) does not point at an element of an array object. Simply because a pointer holding a null pointer value does not point at any object. Which is said at:

6.3.2.3 Pointers

3 If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

So you can't do arithmetic on it. The warning is justified, because as the second highlighted sentence mentions, we are in the case of undefined behavior.

Don't be fooled by the fact the offsetof macro is possibly implemented like that. The standard library is not bound by the constraints placed on user programs. It can employ deeper knowledge. But doing this in our code is not well defined.

StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458
  • Then why changing from 0 to 1 suppresses this warning ? – explogx Jan 17 '19 at 09:50
  • 1
    @Prion - Because 1 does not hold a special meaning like 0. 1 is not a null pointer constant, converting it to a pointer type does not produce a null pointer. Conversion of general integers into pointers has implementation defined semantics. – StoryTeller - Unslander Monica Jan 17 '19 at 09:51
  • Note that the `offsetof` _macro_, being a macro, is compiled within the space of the user pogram. Hence it may not use null pointer arithmetic. – Paul Ogilvie Jan 17 '19 at 09:52
  • 1
    @PaulOgilvie - No, it may. Again, it does not fall under the constraints of user code. It is, for all intents and purposes part of the language itself, despite coming from the library. A C compiler *knows* about it. – StoryTeller - Unslander Monica Jan 17 '19 at 09:53
  • The offsetof macro does not trigger this warning but it's definitely the same thing: a null pointer arithmetic. – explogx Jan 17 '19 at 09:54
  • @Prion - I pointed out the distinction in my answer for a reason. Being part of the standard library carries great meaning. – StoryTeller - Unslander Monica Jan 17 '19 at 09:54
  • So if I copy the `offsetof` macro and call it something else, the compiler will trigger on null pointer arithmetic? – Paul Ogilvie Jan 17 '19 at 09:55
  • @PaulOgilvie - Quite possibly. If we take off the language lawyer hat for a sec, the macro may very well contain `_Pragma`'s beyond the simple implementation. If it does not, I'm willing to bet you'll trigger the warning by copying and renaming. – StoryTeller - Unslander Monica Jan 17 '19 at 09:57
  • 1
    The standard only defines which headers and functions must be available in the _standard library_. It does describe the required behavior. This does not imply they are "part of the language". The library (a library, any library) may not use constructs in headers that are not allowed by user programs. This follows from the definition of what a library is. – Paul Ogilvie Jan 17 '19 at 10:05
  • @Prion The compiler can't tell if the `1`, being a raw address converted to a pointer, points at an array or not. If you happen to have some array type there in the form of hardware registers, then the compiler will have to accept it. If not, as seems likely, the code has undefined behavior. Funny thing though, many microcontrollers have hardware registers placed on address 0, while at the same time using compilers that implement the null pointer representation as 0 too. – Lundin Jan 17 '19 at 10:08
  • Note that this a new warning, I compiled the same code at my workplace with an older version of Clang without any issues. – explogx Jan 17 '19 at 10:08
  • @PaulOgilvie - And where do you draw that "definition of a library" from, exactly? Your intuition or the C standard? The fact the standard requires a certain behavior is **exactly** why it can treat constructs in the library differently. Anyway, since we are quite obviously at an impasse, and I'm not interested in a lengthy discussion, feel free to downvote or post your own answer. – StoryTeller - Unslander Monica Jan 17 '19 at 10:08
  • @Prion - Means nothing. Compilers improve over time. The lack of a warning never means that behavior is well defined. – StoryTeller - Unslander Monica Jan 17 '19 at 10:09
  • Clang is known to put useless warnings. I would rely on gcc more, using the `-pedantic -Wall -Wextra` flags, I will go to work and test that. – explogx Jan 17 '19 at 10:10
  • @Prion - Do as you will. Warning or no warning, you explicitly asked for the standard's wording on this, and I provided exactly that. – StoryTeller - Unslander Monica Jan 17 '19 at 10:11
  • The thing is, undefined behaviour in this case should not always trigger a warning, I tested my own version of the `offsetof` macro, which does involve null pointer arithmetic, and Clang said nothing about it. – explogx Jan 17 '19 at 10:13
  • 2
    @Prion - I feel we are going around in circles. The philosophical aspects of compiler design with regard to warnings have 0 bearing on the normative text in the standard. – StoryTeller - Unslander Monica Jan 17 '19 at 10:15
  • Yes, but why would my code trigger such warning and using the same null pointer arithmetic fashion for the offsetof trick does not ? That is the real question. – explogx Jan 17 '19 at 10:18
  • 3
    @Prion - Which I also already addressed. The compiler has knowledge that the standard library in a hosted implementation provides `offsetof` and you are not allowed to redefine it, for risk of UB. So it assumes it comes from the library (because we all write UB free programs of course) and doesn't complain. If you are still not satisfied, the "ask a question" button is always available to you. I answered *exactly* the question you posed. – StoryTeller - Unslander Monica Jan 17 '19 at 10:20
  • So the standard disallows null-pointer arithmetic. Next, the standard requires a macro `offsetof` whose natural implementation would use null-pointer arithmetic. So any compiler implementor providing such a null-pointer arithmetic implementation violates the language, though the implementors declare it perfectly safe. So all this compiler implementor can do is have the compiler recognize their own `offsetof` and suppress warnings. Does this description make sense? – Paul Ogilvie Jan 17 '19 at 10:27
  • 2
    @PaulOgilvie - No, not *any* compiler implementer. GCC avoids the problem otherwise with `__builtin_offsetof`. – StoryTeller - Unslander Monica Jan 17 '19 at 10:28
  • Clang as well. In fact, I don’t see any code which would relies on the null pointer trick for the `offsetof` macro – explogx Jan 18 '19 at 06:41
  • 2
    https://en.wikipedia.org/wiki/Offsetof. Whenever I've checked the `offsetof` macro it has been defined as `((size_t)&(((st *)0)->m))`. Which is fine because its not really de-referencing a null pointer but evaluates to a constant value. Also there is a huge difference between the specification declaring something is undefined behavior and the compiler implementation giving such behavior a useful outcome. – fdk1342 Feb 03 '19 at 00:32
  • Might be friendly to mention in passing that this construct is allowed in C++. – Nemo Aug 29 '22 at 21:35
  • @Nemo - if it is, you'll have to demonstrate. Last I looked, the normative C++ text did not allow for this – StoryTeller - Unslander Monica Aug 29 '22 at 21:59
  • @StoryTeller-UnslanderMonica It is a somewhat recent development. See http://www.eel.is/c++draft/expr.add#4.1 and http://www.eel.is/c++draft/expr.add#5.1. Introduced in [C++17](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf#page=146). – Nemo Aug 30 '22 at 14:47
  • @Nemo - While interesting, it doesn't quite make the trick valid. It only "works" when `count == 0`. The one case for the trick that no one worries about. – StoryTeller - Unslander Monica Aug 30 '22 at 15:54
  • @StoryTeller-UnslanderMonica It's the one case this question is asking about. Also it can be useful... Evaluating `p - q` where both might be `NULL` is very possible, and is in fact how I started down this rabbit hole. – Nemo Aug 30 '22 at 21:06
  • @Nemo - *"It's the one case this question is asking about"* nowhere in the question is `count` assumed to be zero. – StoryTeller - Unslander Monica Aug 30 '22 at 21:52
1

When the C Standard was written, the vast majority of C implementations would, for any non-void* pointer value p, uphold the invariants that p+0 and p-0 both yield p, and p-p will yield zero. More generally, operations like a size-zero memcpy or fwrite that operate on a buffer of size N would ignore the buffer address when N was zero. Such behavior would allow programmers to avoid having to write code to handle corner cases. For example, code to output a packet with an optional payload passed via address and length arguments would naturally process (NULL,0) as an empty payload.

Nothing in the published Rationale for the C Standard suggests that implementations whose target platforms would naturally behave in such fashion shouldn't continue to work as they always had. There were, however, a few platforms where it may have been expensive to uphold such behavioral guarantees in cases where p is null.

As with most situations where the vast majority of C implementations would process a construct identically, but implementations might exist where such treatment would be impractical, the Standard characterizes the addition of zero to a null pointer as Undefined Behavior. The Standard allows implementations to, as a form of "conforming language extension", define the behavior of constructs in cases where it imposes no requirements, and it allow conforming (but not strictly conforming) programs to make use of them. According to the published Rationale, the stated intention was that support for such "popular extensions" be regarded as a "quality of implementation" issue to be decided by the marketplace. Implementations that could support them at essentially zero cost would do so, but implementations where such support would be expensive would be free to support such constructs or not based upon their customers' needs.

If one is using a compiler that targets commonplace platforms, and is designed to process the widest range of useful programs reasonably efficiently, then the extended semantics surrounding pointer arithmetic may allow one to write code more efficiently than would otherwise be possible. If one is targeting a compiler that does not value compatibility with quality compilers, however, one should recognize that it may treat the Standard's allowance for quirky hardware as an invitation to behave nonsensically even on commonplace hardware. Of course, one should also be aware that such compilers may behave nonsensically in corner cases where adherence with the Standard would require them to forego optimizations that are unsound but would "usually" be safe.

supercat
  • 77,689
  • 9
  • 166
  • 211