80

Is the difference of two non-void pointer variables defined (per C99 and/or C++98) if they are both NULL valued?

For instance, say I have a buffer structure that looks like this:

struct buf {
  char *buf;
  char *pwrite;
  char *pread;
} ex;

Say, ex.buf points to an array or some malloc'ed memory. If my code always ensures that pwrite and pread point within that array or one past it, then I am fairly confident that ex.pwrite - ex.pread will always be defined. However, what if pwrite and pread are both NULL. Can I just expect subtracting the two is defined as (ptrdiff_t)0 or does strictly compliant code need to test the pointers for NULL? Note that the only case I am interested in is when both pointers are NULL (which represents a buffer not initialized case). The reason has to do with a fully compliant "available" function given the preceding assumptions are met:

size_t buf_avail(const struct s_buf *b)
{     
    return b->pwrite - b->pread;
}
Destructor
  • 14,123
  • 11
  • 61
  • 126
John Luebs
  • 861
  • 6
  • 8
  • 1
    have you tried doing the operation more than once? – Hunter McMillen Nov 14 '11 at 21:14
  • 10
    What do you mean? I know for a fact that the result of this operation is 0 on 95% (let's say the 5% is AS/400) of implementations out there and nothing bad will happen. I am not interested in the implementation specifics. My question pertains to some specific standard definitions. – John Luebs Nov 14 '11 at 21:27
  • 8
    Hunter McMillen: That is bad approach - "I stored pointer in int and nothing happened. I check on different computer and compilator and nothing happened. Then came 64-bit computers". If something works now but relies on undefined behaviour it may not work in future. – Maciej Piechotka Nov 14 '11 at 22:34
  • 3
    I commend you for ensuring your code is *guaranteed* to work by the relevant standards rather than just noticing that it happened to work on the platforms you tested. – David Schwartz Nov 15 '11 at 01:33
  • The likely unusual cases are platforms which may have several representations of NULL - for example the 8086 segmented architecture. – Toby Speight Apr 12 '16 at 08:33
  • 1
    @TobySpeight: An 8086 compiler will have a different representation for a `near`-qualified null pointer from a `far`-qualified one, but would it use multiple representations for null `far` pointers? If a null `near` pointer is converted to a `far` pointer that is in turn compared to a null `far` pointer when e.g. DS equals 0x1234, what happens: (1) 0x0000 gets conveted to 0x0000:0x0000; (2) 0x0000 gets converted to 0x1234:0x0000, but the comparison operator checks for the both-segments-zero case, or (3) 0x0000 gets converted to 0x1234:0x0000, which compares unequal to 0x0000:0x0000. – supercat Oct 18 '18 at 17:27
  • @TobySpeight: Since the Standard doesn't say anything about `near` and `far` pointers, it wouldn't require any particular approach. Approach #1 would slow down near-to-far conversions and approach #2 would slow down comparisons, but approach #3 might be somewhat astonishing. For most purposes, I think the quirky semantics of #3 would be better than the quirks of #2 or the performance drain of #1, but I'm not sure what compilers actually do. – supercat Oct 18 '18 at 17:30

4 Answers4

102

In C99, it's technically undefined behavior. C99 §6.5.6 says:

7) For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

[...]

9) When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. [...]

And §6.3.2.3/3 says:

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.55) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

So since a null pointer is unequal to any object, it violates the preconditions of 6.5.6/9, so it's undefined behavior. But in practicality, I'd be willing to bet that pretty much every compiler will return a result of 0 without any ill side effects.

In C89, it's also undefined behavior, though the wording of the standard is slightly different.

C++03, on the other hand, does have defined behavior in this instance. The standard makes a special exception for subtracting two null pointers. C++03 §5.7/7 says:

If the value 0 is added to or subtracted from a pointer value, the result compares equal to the original pointer value. If two pointers point to the same object or both point one past the end of the same array or both are null, and the two pointers are subtracted, the result compares equal to the value 0 converted to the type ptrdiff_t.

C++11 (as well as the latest draft of C++14, n3690) have identical wording to C++03, with just the minor change of std::ptrdiff_t in place of ptrdiff_t.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • 15
    Due to completeness, this is the best answer currently. – John Dibling Nov 14 '11 at 21:40
  • This seems like an oversight in the standard that should be corrected by "9) When two pointers are subtracted, if they are equal, the result is zero. Otherwise, both shall point to elements of the same array object..." – R.. GitHub STOP HELPING ICE Nov 15 '11 at 05:41
  • @R.., to disambiguate it should say "compare equal" no? Because two null pointers might not contain the same value, so they "are" not equal. – Jens Gustedt Nov 15 '11 at 11:06
  • It also looks like the [latest draft of the upcoming C1X standard](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) also has the same language. I hope that this is in fact an oversight and that the language committee fixes it. – Adam Rosenfield Nov 15 '11 at 15:51
  • Two null pointers are the same *value* by virtue of comparing equal. Of course they may not have the same *representation*. – R.. GitHub STOP HELPING ICE Nov 15 '11 at 16:38
  • Subtracting two nulls is still legal in C++11 and the latest C++14 draft. (Just reassuring everyone, since I see no reason to ever rescind this.) – CTMacUser Oct 15 '13 at 05:48
  • @R.. if a system has multiple representations for null pointers, then this proposed rule would not allow a compiler to emit a simple subtraction instruction for subtraction of `char *`, for example; i.e. it could impose undue performance penalties . I'm not sure what the use-case would be either – M.M Oct 18 '18 at 22:06
36

I found this in the C++ standard (5.7 [expr.add] / 7):

If two pointers [...] both are null, and the two pointers are subtracted, the result compares equal to the value 0 converted to the type std::ptrdiff_t

As others have said, C99 requires addition/subtraction between 2 pointers be of the same array object. NULL does not point to a valid object which is why you cannot use it in subtraction.

Pubby
  • 51,882
  • 13
  • 139
  • 180
24

Edit: This answer is only valid for C, I didn't see the C++ tag when I answered.

No, pointer arithmetic is only allowed for pointers that point within the same object. Since by definition of the C standard null pointers don't point to any object, this is undefined behavior.

(Although, I'd guess that any reasonable compiler will return just 0 on it, but who knows.)

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • +1: This was almost exactly the argument I was going to use in a post of my own. – Oliver Charlesworth Nov 14 '11 at 21:17
  • @OliCharlesworth, for once I have been faster, doesn't happen too often. – Jens Gustedt Nov 14 '11 at 21:19
  • The standard includes a lot of exceptions for special cases like this, have you checked to make sure this isn't specifically addressed? – Mark Ransom Nov 14 '11 at 21:23
  • 5
    This is incorrect. See 5.7 [expr.add] / 7: "If two pointers point to the same object or both point one past the end of the same array or **both are null**, and the two pointers are subtracted, the result compares equal to the value 0 converted to the type `std::ptrdiff_t`." – CB Bailey Nov 14 '11 at 21:25
  • @Charles: It is correct for C, but the answer should explicitly say so. – Oliver Charlesworth Nov 14 '11 at 21:27
  • 1
    This is incorrect in the specific context of C++. What about for the other languages tagged? – John Dibling Nov 14 '11 at 21:27
  • @OliCharlesworth: I see, I only saw this question while browsing C++ tags. – CB Bailey Nov 14 '11 at 21:28
  • @Joergen: The relevant section of the C99 standard is 6.5.6 paragraph 9. – Oliver Charlesworth Nov 14 '11 at 21:29
  • 8
    Wow, never thought this lame question would touch on a C/C++ spec difference. – John Luebs Nov 14 '11 at 21:29
  • As we know this answer is incorrect. Maybe you should delete it to avoid confusion. – Martin York Nov 14 '11 at 21:40
  • @JohnDibling, editing my answer before I got a chance to do that myself? I found this behavior particularly rude. – Jens Gustedt Nov 14 '11 at 22:08
  • @JohnDibling: Still this is first of all *my* answer, I didn't make it wiki. In all the experience I have on this site people try first out to solve such things by reason. I think this is the first time somebody acts like this in all the time I am around here. I was really pissed off by it. – Jens Gustedt Nov 14 '11 at 22:14
  • 3
    @Jens: The FAQ and site administrators encourage us to edit answers if we can make them better. Your answer is better now than it was before. I did not intend to offend and, to be honest, given the policies of the site I think you are out of line to *be* offended. See: http://stackoverflow.com/privileges/edit – John Dibling Nov 14 '11 at 22:17
  • 2
    Jens, ordinarily I would agree with you. I try never to edit someone's answer but to point out my disagreements with a comment - fewer feathers are ruffled and they might learn something. Or maybe I'm wrong instead and my edit would be counter-productive. But in this case I think John's edit was justified, because your answer was top-rated but clearly not 100% correct. It was necessary to keep people from "piling on" to a correct-looking answer without considering the alternatives. – Mark Ransom Nov 14 '11 at 22:25
  • Thank you @mark for your eloquent description of what were, in fact, my motives. For what its worth, this is the first time I have ever edited an answer in such a way. In this case I waited about 20 minutes but ultimately felt that time was of the essence. – John Dibling Nov 14 '11 at 22:50
0

The C Standard does not impose any requirements on the behavior in this case, but many implementations do specify the behavior of pointer arithmetic in many cases beyond the bare minimums required by the Standard, including this one.

On any conforming C implementation, and nearly all (if not all) implementations of C-like dialects, the following guarantees will hold for any pointer p such that either *p or *(p-1) identifies some object:

  • For any integer value z that equals zero, The pointer values (p+z) and (p-z) will be equivalent in every way to p, except that they will only be constant if both p and z are constant.
  • For any q which is equivalent to p, the expressions p-q and q-p will both yield zero.

Having such guarantees hold for all pointer values, including null, may eliminate the need for some null checks in user code. Further, on most platforms, generating code that upholds such guarantees for all pointer values without regard for whether they are null would be simpler and cheaper than treating nulls specially. Some platforms, however, may trap on attempts to perform pointer arithmetic with null pointers, even when adding or subtracting zero. On such platforms, the number of compiler-generated null checks that would have to be added to pointer operations to uphold the guarantee would in many cases vastly exceed the number of user-generated null checks that could be omitted as a result.

If there were an implementation where the cost of upholding the guarantees would be great, but few if any programs would receive any benefit from them, it would make sense to allow it to trap "null+zero" computations, and require that user code for such an implementation include the manual null checks that the guarantees could have made unnecessary. Such an allowance was not expected to affect the other 99.44% of implementations, where the value of upholding the guarantees would exceed the cost. Such implementations should uphold such guarantees, but their authors shouldn't need the authors of the Standard to tell them that.

The authors of C++ have decided that conforming implementations must uphold the above guarantees at any cost, even on platforms where they could substantially degrade the performance of pointer arithmetic. They judged that the value of the guarantees even on platforms where they would be expensive to uphold would exceed the cost. Such an attitude may have been affected by a desire to treat C++ as a higher-level language than C. A C programmer could be expected to know when a particular target platform would handle cases like (null+zero) in unusual fashion, but C++ programmers weren't expected to concern themselves with such things. Guaranteeing a consistent behavioral model was thus judged to be worth the cost.

Of course, nowadays questions about what is "defined" seldom have anything to do with what behaviors a platform can support. Instead, it is now fashionable for compilers to--in the name of "optimization"--require that programmers manually write code to handle corner cases which platforms would previously have handled correctly. For example, if code which is supposed to output n characters starting at address p is written as:

void out_characters(unsigned char *p, int n)
{
  unsigned char *end = p+n;
  while(p < end)
    out_byte(*p++);
}

older compilers would generate code that would reliably output nothing, with no side-effect, if p==NULL and n==0, with no need to special-case n==0. On newer compilers, however, one would have to add extra code:

void out_characters(unsigned char *p, int n)
{
  if (n)
  {
    unsigned char *end = p+n;
    while(p < end)
      out_byte(*p++);
  }
}

which an optimizer may or may not be able to get rid of. Failing to include the extra code may cause some compilers to figure that since p "can't possibly be null", any subsequent null pointer checks may be omitted, thus causing the code to break in a spot unrelated to the actual "problem".

supercat
  • 77,689
  • 9
  • 166
  • 211