9

Some C or C++ programmers are surprised to find out that even storing an invalid pointer is undefined behavior. However, for heap or stack arrays, it's okay to store the address of one past the end of the array, which allows you to store "end" positions for use in loops.

But is it undefined behavior to form a pointer range from a single stack variable, like:

char c = 'X';
char* begin = &c;
char* end = begin + 1;

for (; begin != end; ++begin) { /* do something */ }

Although the above example is pretty useless, this might be useful in the event that some function expects a pointer range, and you have a case where you simply have a single value to pass it.

Is this undefined behavior?

Community
  • 1
  • 1
Channel72
  • 24,139
  • 32
  • 108
  • 180
  • 1
    Are you sure you worded it right, that _storing the address of an invalid pointer_ is undefined behaviour? `int* ptr; int** ptr2 = &ptr` is storing the address of an invalid pointer. Is it UB? And if you mean that we can't have pointers point to invalid memory, then how do we have pointers to `NULL`? – Seth Carnegie Feb 02 '12 at 15:20
  • 1
    See related http://stackoverflow.com/questions/8379186/how-do-i-take-the-address-of-one-past-the-end-of-an-array-if-the-last-address-is – ugoren Feb 02 '12 at 15:20
  • @SethCarnegie: `int* ptr; int* ptr2 = &ptr` does not even compile, because the type of `ptr2` does not match up. Also, the `nullptr` is a special case. – Mankarse Feb 02 '12 at 15:23
  • @Seth, NULL is a special value reserved by the standard. – Channel72 Feb 02 '12 at 15:23
  • Channel72 that answered only one of my questions. @Mankarse fixed – Seth Carnegie Feb 02 '12 at 15:23
  • @Seth, I don't see the problem. `int** ptr2 = &ptr` doesn't point to an invalid address. It points to a stack allocated pointer variable. – Channel72 Feb 02 '12 at 15:26
  • @Seth, actually I see what you mean. I think you were confused by my wording - I shouldn't have said "storing the address of an invalid pointer", but rather "storing an invalid pointer". – Channel72 Feb 02 '12 at 15:28
  • Ok thanks, and also, it seems like the answer to the question you linked only indicates that performing arithmetic with invalid pointers is UB. For instance, is `int* ptr = (int*)0x12345678;` UB? (not sure if the cast is needed) – Seth Carnegie Feb 02 '12 at 15:30
  • The undefined behavior is reading the value of an invalid pointer, not storing anything. It's the lvalue to rvalue conversion which triggers the undefined behavior, so even something like `ptr == 0` is undefined behavior if `ptr` is an invalid pointer. – James Kanze Feb 02 '12 at 16:01
  • @SethCarnegie The cast is clearly necessary. And using the value resulting from the case in any way (including copying it into a named variable) is undefined behavior. According to the standard---an implementation is free to define it if it pleases. – James Kanze Feb 02 '12 at 16:03
  • @James What part of the standard are you referring to when you say that? – Seth Carnegie Feb 02 '12 at 18:23
  • The fact that the standard doesn't define what happens. It's (potentially) an invalid pointer, and an lvalue to rvalue conversion of any invalid object is undefined behavior. (With one exception: character types.) – James Kanze Feb 02 '12 at 19:36
  • @JamesKanze: Union and structure types another exception in many cases; given `struct { int *p; } foo,bar;` the statement `bar = foo;` may be safely invoked regardless of whether `foo` contains a valid pointer. If `foo` holds an invalid pointer, the assignment would cause `bar` to do likewise, and an attempt to use `bar.p` would cause Undefined Behavior. From what I understand there may be some controversy over whether `foo = bar;` would be required to copy anything in the case that `bar` was known to be invalid; I would think that it should be required to do so if any code might... – supercat Jul 04 '15 at 19:37
  • ...use `memcmp` to compare `foo` with a pointer whose lifetime overlapped that of `bar.p`, but I don't think that view is universally shared. – supercat Jul 04 '15 at 19:43

6 Answers6

14

This is allowed, the behavior is defined and both begin and end are safely-derived pointer values.

In the C++ standard section 5.7 ([expr.add]) paragraph 4:

For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

When using C a similar clause can be found in the the C99/N1256 standard section 6.5.6 paragraph 7.

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.


As an aside, in section 3.7.4.3 ([basic.stc.dynamic.safety]) "Safely-derived pointers" there is a footnote:

This section does not impose restrictions on dereferencing pointers to memory not allocated by ::operator new. This maintains the ability of many C++ implementations to use binary libraries and components written in other languages. In particular, this applies to C binaries, because dereferencing pointers to memory allocated by malloc is not restricted.

This suggests that pointer arithmetic throughout the stack is implementation-defined behavior, not undefined behavior.

tinman
  • 6,348
  • 1
  • 30
  • 43
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • A lot of conflicting answers and interpretations here, but this answers seems to be pretty definitive. – Channel72 Feb 02 '12 at 15:18
  • Seeing as this question was also tagged C, the same behaviour is specified in the C99 spec (N1256) in section 6.5.6 paragraph 7. – tinman Feb 02 '12 at 15:26
  • Thanks, @tinman. If you have the actual text, feel free to add the quote. – Ben Voigt Feb 02 '12 at 15:29
  • The `[expr.add]` paragraph only seems to apply to the additive operators. I cannot find a corresponding paragraph for relational operators. – Mankarse Mar 01 '12 at 11:19
  • @Mankarse: `end = begin + 1`... that looks like an additive operator to me. – Ben Voigt Mar 01 '12 at 17:44
  • @BenVoigt: Sure, but an algorithm using `RandomAccessIterator`s could legally evaluate `begin < end`, so in general the resulting range would not be safe to pass into an algorithm expecting `RandomAccessIterator`s. – Mankarse Mar 02 '12 at 00:15
  • @Mankarse: A matter of interpretation. I'd say that if adding one "behaves the same", then the pointer addition expression MUST evaluate to a pointer that compares greater than the original, because that happens for pointer addition within an array. – Ben Voigt Mar 02 '12 at 04:00
  • @BenVoigt - True, and it is difficult to see how allowing the creation a pointer range over a single object could be useful without the resulting pointers acting entirely like pointers to elements of an array. – Mankarse Mar 02 '12 at 04:21
4

I believe that legally, you may treat a single object as an array of size one. In addition, it is most definitely legal to take a pointer one past the end of any array as long as it's not de-referenced. So I believe that it is not UB.

Puppy
  • 144,682
  • 38
  • 256
  • 465
3

It is not Undefined Behavior as long as you don't dereference the invalid iterator.
You are allowed to hold a pointer to memory beyond your allocation but not allowed to dereference it.

sehe
  • 374,641
  • 47
  • 450
  • 633
Alok Save
  • 202,538
  • 53
  • 430
  • 533
  • 1
    The question I linked to indicates you are NOT allowed to hold a pointer to an address beyond your allocation, (which surprises many people), unless it is one past the end of an array. – Channel72 Feb 02 '12 at 15:11
  • which this one is. Therefore it is defined here. – CashCow Feb 02 '12 at 15:27
2

5.7-5 of ISO14882:2011(e) states:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

Unless I overlooked something there, the addition only applies to pointers pointing to the same array. For everything else, the last sentence applies: "otherwise, the behaviour is undefined"

edit: Indeed, when you add 5.7-4 it turns out that the operation you do is (virtually) on an array, thus the sentence does not apply:

For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

PlasmaHH
  • 15,673
  • 5
  • 44
  • 57
0

In general it would be undefined behaviour to point beyond the memory space, however there is an exception for "one past the end", which is valid according to the standard.

Therefore in the particular example, &c+1 is a valid pointer but cannot be safely dereferenced.

CashCow
  • 30,981
  • 5
  • 61
  • 92
-4

You could define c as an array of size 1:

char c[1] = { 'X' };

Then the undefined behavior would become defined behavior. Resulting code should be identical.

  • 1
    You could but 1. there is no need and 2. that isn't the user's question. There isn't an undefined behaviour here and your answer might get some downvotes although I haven't given it one myself. – CashCow Feb 02 '12 at 15:28
  • 1
    It's not undefined behavior to begin with. – Ben Voigt Feb 02 '12 at 15:29
  • If the original code was undefined behavior, then this code would be defined behavior. Similar as signed integer overflow is undefined behavior (and some compilers use this for some optimization). There you can use a casting from a signed to an unsigned type to get defined behavior. – unknownfrog Feb 02 '12 at 16:55