7

In short, is the following code considered to have undefined behavior?

int main()
{
    int *p = <some invalid pointer value>;
}

For a compiling example, take the following code:

int main()
{
    int *p = new int;
    delete p; // Now p has an invalid pointer value.
    int *q = p; // UB?
}

I've done some research on the topic, so these are the relevant information I've found so far:

A pointer value (according to cppreference) can be one of:

  • A pointer to an object or function;
  • A pointer past the end of an object;
  • The null pointer value;
  • An invalid pointer value.

Also, according to cppreference,

Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.

This thread addresses some uses of invalid pointers. Specifically, this answer mentions the Rationale document (C99), which has the following paragraph (section 6.3.2.3):

Regardless how an invalid pointer is created, any use of it yields undefined behavior. Even assignment, comparison with a null pointer constant, or comparison with itself, might on some systems result in an exception.

I'm not sure what's the state of affairs for C++, but I'd consider that, given the answers on the linked thread, uses of invalid pointers result in undefined behavior. Note, though, that assignment is not the same as initialization, so I'm not sure initialization is considered a use.

Mário Feroldi
  • 3,463
  • 2
  • 24
  • 49
  • Every time you `delete` a pointer it will point to invalid memory. As long as you don't try to dereference it all is ok. I don't see why this would be any different. – super May 26 '18 at 20:11
  • 2
    @super "*As long as you don't try to dereference it all is ok.*" This is wrong. See the quotes from the standard provided by OP. – melpomene May 26 '18 at 20:16
  • The UB happens during the read from `p`. It does not matter whether this reading of a pointer value is part of an initializer (`int *q = p;`) or assignment (`q = p;`). – melpomene May 26 '18 at 20:20
  • @melpomene, I'm struggling to find a paragraph in the C++ standard that addresses it. – Mário Feroldi May 26 '18 at 20:25
  • I'm not going to look, but I bet the main issue is `` causing UB all by itself. The initialization part is irrelevant. – melpomene May 26 '18 at 20:26
  • 1
    Not looking for the language standards, but loading a pointer into a [68000 address register](https://en.wikipedia.org/wiki/Motorola_68000#Architecture) or an [x86 segment register](https://en.wikipedia.org/wiki/X86_memory_segmentation#Protected_mode) would validate the address. If it is not currently mapped to RAM, it would trap. – Bo Persson May 26 '18 at 20:31
  • 1
    That only proves *implementation-defined*. And BTW one would expect the implementation to not load into those registers if the pointer is never dereferenced. – rustyx May 26 '18 at 20:41
  • 1
    @rustyx - Just pointing to reasons why copying an invalid pointer is not well defined. The guys writing the language standards were well aware of this. And if I had ever written a compiler for the 68000, I would surely have used the dedicated address registers for copying pointers. Why waste a data register for that? – Bo Persson May 26 '18 at 21:47
  • 1
    @BoPersson The 68000 only checks the validity of the value stored in an address register when you try to dereference it. These registers are only 'special' in the sense that they can be used in indirect instructions whereas data registers cannot. And while you are correct that loading a segment register with an invalid value will cause an exception in x86, the compiler will only emit code to load such a register when it is going to dereference the pointer. This operation is expensive so the compiler won't do it unless it has to. – Paul Sanders May 27 '18 at 05:11
  • So I think the practise of loading or comparing an invalid pointer value is vanishingly unlikely to cause a problem in practise, and certainly not on any of the processors in common use today. After all, why should the processor perform access checks when it doesn't need to? Even a pointer 1 past the end of an array is technically invalid, but doing that is common practise. (Yes, I know that is special cased in the standard but suppose I allocate the array with `malloc`. How can the compiler now ensure that that location is valid?). Worry more about 16:16 pointers wrapping. – Paul Sanders May 27 '18 at 05:23
  • 1
    @PaulSanders: The danger isn't with processors, but with compilers, and the danger is growing, not shrinking. The authors of the Standard were certainly aware that some systems could guarantee the behavior of performing all operations other than dereferencing on pointer values with any possible bit pattern, that such guarantees could be useful, and that some existing code relied upon such guarantees. Because there are some platforms where it would be impractical to offer such guarantees, the authors of the Standard left the question of whether to offer such guarantees up to compiler writers. – supercat May 29 '18 at 17:10
  • 1
    @PaulSanders: Unfortunately, some compiler writers have interpreted the Stanard writers' decision not to mandate that compilers behave that way even on platforms where it would be impractical, as an invitation to assume that any code which would rely upon such behavior should be viewed as "broken" and there is no need to support such code. The published rationale for the Standard notes that the authors view such things as a "quality of implementation" matter, but compiler writers are more interested in "optimization" than the quality of semantics an implementation can offer. – supercat May 29 '18 at 17:18
  • @supercat OK, I can sort-of see where you're coming from, but faced with something like `return (char *) 1;` what do you expect such a compiler to actually _do_? Surely it would be perverse to do anything other than stick 1 in `rax` and return. Compiler writers must know that a lot of existing code does things like this (it's even enshrined in the Windows API, in the form of the `MAKEINTRESOURCE` macro) so why would they set out to break it? There is no cost to them in doing what the author of the source code has asked them to do, so why not? – Paul Sanders May 29 '18 at 19:09
  • @PaulSanders: The problem isn't with things like `return (char*)1;` but rather with other fun things a compiler might do. For example, if a compiler which is given an expression like `(q-p1)+p2` concludes that `p1` and `p2` will be equal within the lifetime of `*p1`, it might replace such an expression with `q`. That would be a useful optimization if code never needed to compute `q-p1` after the lifetime of `*p1` ended, but if `p1` had been the address of a pointer passed to `realloc`, `q` had been something with that block, and `p2` was the block's new address... – supercat May 29 '18 at 19:18
  • ...then being able to compute `(q-p1)` whether or not the block had been relocated may be nicer than having to compute the offsets of everything prior to the `realloc`. Likewise, if a function which takes a pointer to, and size of, some data, and specifies that (NULL,0) indicates the "no data" case, a compiler that sees something like `uint8_t *endptr = dat+size; for (uint8_t *p = dat; p < endPtr; p++) ...` might conclude it can omit tests later in the code that would check whether `dat` is null [since if `dat` were null, `dat+size` would have yielded UB]. – supercat May 29 '18 at 19:25
  • I doubt the value of such optimizations would generally be sufficient to justify the effort to find them, outside of contrived situations, but compiler writers do not regard things like the inability to process `p+n` in a way that will always yield `p` when `n` is zero, even when `p` happens to be null, as reducing the quality of a compiler. – supercat May 29 '18 at 19:30
  • @supercat Since when is that last one UB? – curiousguy May 29 '18 at 20:16
  • @curiousguy: I sometimes forget that C++ defines some corner cases that C should but doesn't. In any case, my point is that compilers may decide that an operation may be replaced by something "simpler" whose behaviors would match in all cases defined by the Standard, even if processing the behavior in a fashion characteristic of the environment [one of the ways the Standard suggests that quality implementations may treat UB] would have been useful. – supercat May 29 '18 at 20:21

1 Answers1

4

You’ve all but answered this yourself: it’s implementation defined, not undefined, in C++. The standard says just what you quoted (which I found by consulting the appropriate index). It doesn’t matter whether it’s initialization: the lvalue-to-rvalue conversion on the pointer object explicitly constitutes a use.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76