3

If you know two pieces of information:

  1. A memory address.
  2. The type of the object stored in that address.

Then you logically have all you need to reference that object:

#include <iostream>
using namespace std;

int main()
{
    int x = 1, y = 2;
    int* p = (&x) + 1;
    if ((long)&y == (long)p)
        cout << "p now contains &y\n";
    if (*p == y)
        cout << "it also dereference to y\n";
}

However, this isn't legal per the C++ standard. It works in several compilers I tried, but it's Undefined Behavior.

The question is: why?

Dun Peal
  • 16,679
  • 11
  • 33
  • 46
  • If *you* know two pieces of information, that doesn't necessarily means the *compiler* knows it, nor does it necessarily trust you :-) – paxdiablo Nov 08 '18 at 00:42
  • Compiles and runs on Wandbox: https://wandbox.org/permlink/mMxs9RzqMYWl5hYw – Henri Menke Nov 08 '18 at 00:43
  • 2
    Try turning on optimisation options and watch the code die horribly. – Ken Y-N Nov 08 '18 at 00:43
  • 6
    @HenriMenke Doesn't mean it's valid ;) – Rakete1111 Nov 08 '18 at 00:43
  • I wonder: does this question (or answers to such) change at all if `&y` is *not* in the program? It seems that `y` might not need to "exist at an address", much less "one after" `x`. That is, if the behavior wasn't UB it would need to be defined, and this would *force compilers to codify/honor* several assumptions.. – user2864740 Nov 08 '18 at 00:44
  • @HenriMenke [apropos comment](https://stackoverflow.com/questions/24296571/why-does-this-loop-produce-warning-iteration-3u-invokes-undefined-behavior-an/24297811#comment37622724_24296571) – Shafik Yaghmour Nov 08 '18 at 00:44
  • @Rakete1111 That is true but I though the question was about compiler errors. – Henri Menke Nov 08 '18 at 00:44
  • 2
    Because what you think you know [may not be true](http://coliru.stacked-crooked.com/a/32242b603c12e0de). – Miles Budnek Nov 08 '18 at 00:48
  • 1
    *However, this isn't legal per the C++ standard.* -- I'm not seeing where any of this code violates the C++ standard. – PaulMcKenzie Nov 08 '18 at 00:50
  • @PaulMcKenzie assuming two variables are adjacent in memory is a bad assumption to make. The standard makes no guarantees about how variables are stored. – Remy Lebeau Nov 08 '18 at 00:51
  • 2
    @PaulMcKenzie: `*p==y` is definitely UB. – geza Nov 08 '18 at 00:51
  • @geza Yes, UB, but where does this code violate the C++ standard? Returning a pointer to a local variable is also UB, but it does not violate the C++ standard. – PaulMcKenzie Nov 08 '18 at 00:52
  • @PaulMcKenzie: okay :) People maybe mean different things when they say "legal per the C++ standard", "violates the C++ standard". – geza Nov 08 '18 at 00:55
  • Actually, what do you mean by "why": "in what way does the standard undefine this" or "why does the standard make that choice"? – harold Nov 08 '18 at 00:55
  • @harold I mean both. – Dun Peal Nov 08 '18 at 01:21
  • [segmented architectures is also a reason](https://stackoverflow.com/a/31151779/1708801) – Shafik Yaghmour Nov 08 '18 at 01:37
  • That was not UB before C++17: https://stackoverflow.com/questions/48062346/is-a-pointer-with-the-right-address-and-type-still-always-a-valid-pointer-since – Oliv Nov 08 '18 at 04:27

4 Answers4

6

It wreaks havoc with optimizations.

void f(int* x);

int g() {
    int x = 1, y = 2;
    f(&x);
    return y;
}

If you can validly "guess" the address of y from x's address, then the call to f may modify y and so the return statement must reload the value of y from memory.

Now consider a typical function with more local variables and more calls to other functions, where you'd have to save the value of every variable to memory before each call (because the called function may inspect them) and reload them after each call (because the called function may have modified them).

T.C.
  • 133,968
  • 17
  • 288
  • 421
  • Thanks, it is clear now. Here's the exact sentence in the standard stating that this is UB: http://eel.is/c++draft/basic.compound#3.sentence-10 – Dun Peal Nov 08 '18 at 03:18
  • You mean the exact sentence I linked in my answer ;) – paddy Nov 08 '18 at 03:23
5

If you want to treat pointers as a numeric type, firstly you need to use std::uintptr_t, not long. That's the first undefined behavior, but not the one you're talking about.

It works in several compilers I tried, but it's Undefined Behavior.

The question is: why?

Okay, so the comments section went off when I called this undefined behavior. It's actually unspecified behavior (a.k.a. implementation defined).

You are trying to compare two distinctly unrelated pointers:

  • &x + 1
  • &y

The pointer &x+1 is a one-past-the-end pointer. The standard allows you to have such a pointer, but the behavior is only defined when you use it to compare against pointers based on x. The behavior is not specified if you compare it with anything else: [expr.eq § 3.1]

The compiler is free to put y anywhere it chooses, including in a register. As such, there is no guarantee that &y and &x+1 are related.

As an exercise to someone who wants to show whether this is in fact undefined behavior or not, perhaps start here:

  • [basic.stc.dynamic.safety § 3.4]:

    An integer value is an integer representation of a safely-derived pointer only if its type is at least as large as std​::​intptr_­t and it is one of the following: ...

    3.4 the result of an additive or bitwise operation, one of whose operands is an integer representation of a safely-derived pointer value P, if that result converted by reinterpret_­cast would compare equal to a safely-derived pointer computable from reinterpret_­cast(P).

  • [basic.compound § 3.4] :

    Note: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type that might be located at that address

paddy
  • 60,864
  • 6
  • 61
  • 103
2

If you know address and type of an object and your implementation has relaxed pointer safety [basic.stc.dynamic.safety §4], then it should be legal to just access the object at that address through an appropriate lvalue I think.

The problem is that the standard does not guarantee that local variables of the same type are allocated contiguously with addresses increasing in order of declaration. So you cannot derive the address of y based on that computation you do with the address of x. Apart from that, pointer arithmetic would lead to undefined behavior if you go more than one element past an object ([expr.add]). So while (&x) + 1 is not undefined behavior yet, just the act of even computing (&x) + 2 would be…

Michael Kenzel
  • 15,508
  • 2
  • 30
  • 39
0

The code is legal per the C++ standard (i.e. should compile), but as you already noted the behaviour is undefined. This is because the order of variable declaration does not imply that they will be arranged in memory in the same way.

Henning Koehler
  • 2,456
  • 1
  • 16
  • 20