22

It looks to me like the following program computes an invalid pointer, since NULL is no good for anything but assignment and comparison for equality:

#include <stdlib.h>
#include <stdio.h>

int main() {

  char *c = NULL;
  c--;

  printf("c: %p\n", c);

  return 0;
}

However, it seems like none of the warnings or instrumentations in GCC or Clang targeted at undefined behavior say that this is in fact UB. Is that arithmetic actually valid and I'm being too pedantic, or is this a deficiency in their checking mechanisms that I should report?

Tested:

$ clang-3.3 -Weverything -g -O0 -fsanitize=undefined -fsanitize=null -fsanitize=address offsetnull.c -o offsetnull
$ ./offsetnull
c: 0xffffffffffffffff

$ gcc-4.8 -g -O0 -fsanitize=address offsetnull.c -o offsetnull
$ ./offsetnull 
c: 0xffffffffffffffff

It seems to be pretty well documented that AddressSanitizer as used by Clang and GCC is more focused on dereference of bad pointers, so that's fair enough. But the other checks don't catch it either :-/

Edit: part of the reason that I asked this question is that the -fsanitize flags enable dynamic checks of well-definedness in the generated code. Is this something they should have caught?

Phil Miller
  • 36,389
  • 13
  • 67
  • 90
  • 6
    Performing arithmetic on any pointer not part of an array is UB, with the exception of +1 for one-past-the-end on non-array pointers. – chris Mar 25 '13 at 05:42
  • The compiler only looks at one line at a time, so has no clue that c is NULL. Something like LINT would catch this though. In the case of this program, you are never dereferencing the c variable so nothing invalid ever happens. It is totally OK to do this, and the benefit is that you can now see that you are running on a 64-bit system due to all of the f's! (perhaps the point of the program?) – c.fogelklou Mar 25 '13 at 05:46
  • 4
    @c.fogelklou: You've completely missed the point, and should read what gets posted by others quite carefully - they do confirm that forming that pointer is undefined behavior, regardless of what any one compiler actually does. – Phil Miller Mar 25 '13 at 05:48
  • @chris: Given the stated exception, would that makes `NULL+1` valid to compute? – Phil Miller Mar 25 '13 at 05:49
  • @Novelocrat, I highly doubt it. It's because single variables get treated as single element arrays in pointer arithmetic. There was actually a good question about that asked some time ago. – chris Mar 25 '13 at 05:49
  • The comments [here](http://stackoverflow.com/questions/9114657/is-it-undefined-behavior-to-form-a-pointer-range-from-a-stack-address) cast some interesting additional light. – Phil Miller Mar 25 '13 at 05:53
  • Guys, the pointer is never dereferenced. – c.fogelklou Mar 25 '13 at 05:54
  • @Novelocrat, I can't remember if that is the question I recalled, but it is relevant, and the same idea :) – chris Mar 25 '13 at 05:56
  • As is pointed out by linked stuff and elsewhere, many compilers implement `offsetof()` as a macro doing this sort of thing, but they're allowed to do that for themselves, regardless of what they must accept from input code. – Phil Miller Mar 25 '13 at 05:57
  • @Novelocrat. NULL is ((void *)0). You can NEVER do +1 or -1 on a void *. But you can always do +1 on a char *, which c is. You can only do +1 or -1 on defined types (int, char, etc.) but never void *. So your example is moot. – c.fogelklou Mar 25 '13 at 05:58
  • 3
    The example decrements a `char *`. Anyway `NULL` isn't always defined as `((void*)0)` (at least as far as I remember from nitpicks of C++). – chris Mar 25 '13 at 05:59
  • 2
    @c.fogelklou Your definition of being always able to do a +1 on something is not very useful here: It is valid syntax, there is no reason it shouldn't compile, but it is undefined behaviour, at least in C++, period. – juanchopanza Mar 25 '13 at 06:01
  • 3
    @juanchopanza, C as well. I found both relevant sections (non-array pointer arithmetic and one-past-the-end for single variables) to be the same in the C11 draft I have. – chris Mar 25 '13 at 06:02
  • Ah, found it. In C11, `NULL` is an implementation-defined null pointer constant, and that is "any integer constant expression with the value 0, or such an expression cast to type void *". I believe C++11 relies on the C11 definition for it. – chris Mar 25 '13 at 06:08
  • 1
    Alright I defer to you guys. I didn't consider it undefined because it did exactly what I expected it to do - but after some reading realized that I was misinterpreting the meaning of "undefined behaviour." The reason - which isn't given in any of these answers - for this clause is that some architectures may detect invalid pointers like this automatically (when stored in a pointer register) even if the pointer is never dereferenced. NULL is a special case, but NULL-1 is not, so the program, on some architectures, will crash as soon as the pointer is formed. – c.fogelklou Mar 25 '13 at 06:32
  • "pointer arithmetic on pointers not part of array is UB" `#pragma location 0x00000000 char* entire_address_space[ENTIRE_ADRSPACE_SIZE]; ` There. I fixed it. – Dmitri Nov 12 '15 at 17:40
  • What means 'pointer is part of array'? Any pointer points to an element of an array with infinite elements before pointer and infinite elements after pointer. How a pointer can not be part of an array? You mean a real defined array? In that case pointer of -1th element of array is UB? – Chameleon Feb 11 '20 at 10:17
  • Compiler optimizations emerge such undefined behaviors. This is not a hypothetical scenario. I am here because of this. (I believe) you can reinterpret_cast anything to you pointer and from your pointer. But DO NOT do arithmetics with compiler-time known value pointers (like nulptr). If you want arithmetics, reinterpret on 64 bit integer, do arithmetics and reinterpret back on pointer. I believe C++ theorists will blame me, for that. Compiler optimizations did this to me: `unsigned char *a = (unsigned char *) nullptr + 1; cout << !(a - 1);` results 0. – Chameleon Feb 11 '20 at 13:23
  • 1
    @Chameleon If one needs to do arithmetic on the representation of a pointer value, the standard defines `intptr_t` and `uintptr_t` for exactly that purpose. – Phil Miller Feb 11 '20 at 16:00
  • 1
    And yeah, if you had code that said `int *p = new int[10]; int *q = p-1;` I believe that would be UB, since it forms a pointer that doesn't refer to a a scalar object, and object in an array, or one past the end of an array. – Phil Miller Feb 11 '20 at 16:03

3 Answers3

22

Pointer arithmetic on a pointer not pointing to an array is Undefined behavior.
Also, Dereferencing a NULL pointer is undefined behavior.

char *c = NULL;
c--;

is Undefined defined behavior because c does not point to an array.

C++11 Standard 5.7.5:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

Phil Miller
  • 36,389
  • 13
  • 67
  • 90
Alok Save
  • 202,538
  • 53
  • 430
  • 533
  • 2
    Clearly, dereferencing a `NULL` pointer is UB, as is 'indirection through' a `NULL` pointer as described in C++11. – Phil Miller Mar 25 '13 at 05:46
  • 1
    The "arithmetic on a pointer to not part of an array" being invalid may be the key here. – Phil Miller Mar 25 '13 at 05:46
  • Is a memmory-block also that what you are naming an array? or is pointer-arithmetic in allocated memmory also UB? – dhein Aug 12 '13 at 11:59
  • There are platforms where address zero is a valid address where there are valid reasons to read/write that address. Additionally, while `NULL` *can* be defined to a non-zero value, it's possible NULL is still defined to zero -- in which case, operating on a NULL pointer (not nullptr in C++11) isn't necessarily always UB. – Brian Vandenberg Aug 05 '15 at 16:27
  • rephrase: "in which case, operating on a pointer whose value is equivalent to NULL ..." – Brian Vandenberg Aug 05 '15 at 16:48
  • Would it still be undefined behaviour if I cast the null pointer to `uintptr_t`, perform arithmetic, then cast it back? – Zz Tux Apr 11 '23 at 07:56
17

Yes, this is undefined behavior, and is something that -fsanitize=undefined should have caught; it's already on my TODO list to add a check for this.

FWIW, the C and C++ rules here are slightly different: adding 0 to a null pointer and subtracting one null pointer from another have undefined behavior in C but not in C++. All other arithmetic on null pointers has undefined behavior in both languages.

Richard Smith
  • 13,696
  • 56
  • 78
  • 3
    The intent of [expr.add]p7 sure seems to be that adding 0 to a null pointer, or subtracting two null pointers, is well-defined in C++, but p5 and p6 do explicitly say the behaviour is undefined, and normally, if one part of the standard seems to define the behaviour of a program, while another part says the behaviour is undefined, the part that says the behaviour is undefined wins. –  Jul 31 '13 at 20:18
  • 4
    I've tried to get the wording of p5 and p6 improved (there are a few other things wrong with them) but haven't met with any success so far. Note also that p6's footnote describes another, different and subtly-incompatible, model of pointer arithmetic. – Richard Smith Aug 01 '13 at 02:51
  • @RichardSmith Will it be straightforward to handle code on a platform where writing/reading address 0 is valid, or will we just need to use a sanitizer blacklist for it? – Brian Vandenberg Aug 05 '15 at 17:32
  • @BrianVandenberg: If an application needs to modify things like interrupt tables which start at address zero, I would suggest defining functions to read and write particular specified physical addresses. Unless code is updating such tables particularly frequently, having such code in a separately-linked function which is not visible to the compiler or sanitizer (and could in may cases be easily written in assembly code) shouldn't really hurt performance. – supercat Aug 05 '15 at 19:39
  • Another possibility depending upon how the sanitizer works might be to use something like `int volatile *p; p = (int volatile*)4;` and then accessing that location using `p[-1]`. Normally, given `p[-1]`, there would be value in having the compiler trap when `p` is null, but none in having the compiler trap when `p-1` is null, since the only way the latter could happen would be if something bad had *already* happened. – supercat Aug 05 '15 at 19:40
  • @hvd: An interpretation that UB-ness wins in the absence of explicit language either stating that an action is UB *even when the conditions otherwise necessary to define it apply*, or that an action is defined *unless other conditions apply*, makes a document self-contradictory. It's too bad people haven't denounced as ludicrous compiler writers' notion that a self-contradictory interpretation should be favored over a non-contradictory one. – supercat Sep 21 '16 at 14:28
5

Not only is arithmetic on a null pointer forbidden, but the failure of implementations which trap attempted dereferences to also trap arithmetic on null pointers greatly degrades the benefit of null-pointer traps.

There is never any situation defined by the Standard where adding anything to a null pointer can yield a legitimate pointer value; further, situations in which implementations could define any useful behavior for such actions are rare and could generally better be handled via compiler intrinsics(*). On many implementations, however, if null-pointer arithmetic isn't trapped, adding an offset to a null pointer can yield a pointer which, while not valid, is no longer recognizable as a null pointer. An attempt to dereference such a pointer would not be trapped, but could trigger arbitrary effects.

Trapping pointer computations of the form (null+offset) and (null-offset) would eliminate this danger. Note that protection would not necessarily require trapping (pointer-null), (null-pointer), or (null-null), while the values returned by the first two expressions would be unlikely to have any usefulness [if an implementation were to specify that null-null would yield zero, code which targeted that particular implementation might sometimes be more efficient than code which had to special-case null] they would not generate invalid pointers. Further, having (null+0) and (null-0) either yield null pointers rather than trapping would not jeopardize safety and may avoid the need to have user code special-case null pointers, but the advantages would be less compelling since the compiler would have to add extra code to make that happen.

(*) Such an intrinsic on an 8086 compilers, for example, might accept an unsigned 16-bit integers "seg" and "ofs", and read the word at address seg:ofs without a null trap even when address happened to be zero. Address (0x0000:0x0000) on the 8086 is an interrupt vector which some programs may need to access, and while address (0xFFFF:0x0010) accesses the same physical location as (0x0000:0x0000) on older processors with only 20 address lines, it accesses physical location 0x100000 on processors with 24 or more address lines). In some cases an alternative would be to have a special designation for pointers which are expected to point to things not recognized by the C standard (things like the interrupt vectors would qualify) and refrain from null-trapping those, or else to specify that volatile pointers will be treated in such fashion. I've seen the first behavior in at least one compiler, but don't think I've seen the second.

supercat
  • 77,689
  • 9
  • 166
  • 211