13

The following code (or its equivalent which uses explicit casts of null literal to get rid of temporary variable) is often used to calculate the offset of a specific member variable within a class or struct:

class Class {
public:
    int first;
    int second;
};

Class* ptr = 0;
size_t offset = reinterpret_cast<char*>(&ptr->second) -
                 reinterpret_cast<char*>(ptr);

&ptr->second looks like it is equivalent to the following:

&(ptr->second)

which in turn is equivalent to

&((*ptr).second)

which dereferences an object instance pointer and yields undefined behavior for null pointers.

So is the original fine or does it yield UB?

sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • `offset` as in `offsetof` should be a `size_t` as you have, but the difference of two pointers should be `ptrdiff_t` so somethingd wrong here – Paul Evans Sep 08 '14 at 13:27
  • 4
    Are you trying to implement [`offsetof`](http://en.cppreference.com/w/cpp/types/offsetof)? Looking at its "possible implementation", the answer to your question is **no**, it is not UB (in case the class is a standard layout type; also mentioned in the linked page). – leemes Sep 08 '14 at 13:27
  • @leemes cppreference is not the standard. – Yakk - Adam Nevraumont Sep 08 '14 at 13:33
  • You do not **really** dereference `ptr`. The expression `ptr->second` (or `(*ptr).second`) yields a *reference* to the member variable, not the value. Then you immediately convert the reference to a pointer. If you'd access the reference, it would be UB. – leemes Sep 08 '14 at 13:33
  • @Yakk So? I didn't say that. – leemes Sep 08 '14 at 13:35
  • @leemes: The code in this question http://stackoverflow.com/q/25719244/57428 doesn't access anything either yet it contains UB. – sharptooth Sep 08 '14 at 13:35
  • 3
    @leems The "possible implementation" of `offsetof` on cppreference is **not** strong evidence that dereferencing null is not UB. It is probably observational -- there are compilers under which `offsetof` is implemented exactly like that. The fact that a compiler-provided `offsetof` is implemented that way does not say if the behavior is UB in C++: it strongly implies that it is defined **in that particular compiler** (or the compiler has a bug in its implementation of `offsetof`). Compilers are free to define UB themselves: their headers are not places to find guaranteed defined behavior C++. – Yakk - Adam Nevraumont Sep 08 '14 at 13:49
  • @sharptooth Yes, but that is a method rather than a value. The answer clearly states that this becomes a problem iff the function ever becomes virtual (which would require a vtable, which is a hidden member of an initialized class!). And is UGLY AS HELL. – IdeaHat Sep 08 '14 at 13:49
  • @leemes So `&ptr->second` is not UB and `ptr->second` is? – David G Sep 08 '14 at 13:54
  • 3
    The question of "whether dereferencing a null pointer without causing a lvalue-to-rvalue conversion is UB" is the subject of [CWG issue 232](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232), which is still open. – T.C. Sep 08 '14 at 13:56
  • I think C++ has pretty strict rules on what pointers are legal. I recall reading (but can't recall *where*) that a pointer that was not derived from a valid object/memory location is technically undefined behavior (merely calculating it's value; not even dereferencing it). Here, `&ptr->second` would be a bogus pointer to invalid memory, and as I recall, merely computing its value is undefined behavior. But I don't have a good reference for this and don't have time to find one, so take what I'm saying with a (large) grain of salt. – Cornstalks Sep 08 '14 at 14:00
  • @Cornstalks You are probably thinking of the safely-derived pointer rules in §3.7.4.3 [basic.stc.dynamic.safety]. Computing an unsafely-derived pointer value isn't UB, using it is UB in implementations with strict pointer safety (which is pretty much an empty set). – T.C. Sep 08 '14 at 14:19
  • @t.c. interesting... I removed certainty from my answer thanks to you meddling kids. – Yakk - Adam Nevraumont Sep 08 '14 at 14:38
  • @T.C.: Ah, that might be it. Thanks. – Cornstalks Sep 08 '14 at 15:15

1 Answers1

10

Despite the fact that it does nothing, char* foo = 0; *foo; is could be undefined behavior.

Dereferencing a null pointer is could be undefined behavior. And yes , ptr->foo is equivalent to (*ptr).foo, and *ptr dereferences a null pointer.

There is currently an open issue in the working groups about if *(char*)0 is undefined behavior if you don't read or write to it. Parts of the standard imply it is, other parts imply it is not. The current notes there seem to lean towards making it defined.

Now, this is in theory. How about in practice?

Under most compilers, this works because no checks are done at dereferencing time: memory around where null pointer point to is guarded against access, and the above expression simply takes an address of something around null, it does not read or write the value there.

This is why cpp reference offsetof lists pretty much that trick as a possible implementation. The fact that some (many? most? every one I've checked?) compilers implement offsetof in a similar or equivalent manner does not mean that the behavior is well defined under the C++ standard.

However, given the ambiguity, compilers are free to add checks at every instruction that dereferences a pointer, and execute arbitrary code (fail fast error reporting, for example) if null is indeed dereferenced. Such instrumentation might even be useful to find bugs where they occur, instead of where the symptom occurs. And on systems where there is writable memory near 0 such instrumentation could be key (pre-OSX MacOS had some writable memory that controlled system functions near 0).

Such compilers could still write offsetof that way, and introduce pragmas or the like to block the instrumentation in the generated code. Or they could switch to an intrinsic.

Going a step further, C++ leaves lots of latitude on how non-standard-layout data is arranged. In theory, classes could be implemented as rather complex data structures and not the nearly standard-layout structures we have grown to expect, and the code would still be valid C++. Accessing member variables to non-standard-layout types and taking their address could be problematic: I do not know if there is any guarantee that the offset of a member variable in a non-standard layout type does not change between instances!

Finally, some compilers have aggressive optimization settings that find code that executes undefined behavior (at least under certain branches or conditions), and uses that to mark that branch as unreachable. If it is decided that null dereference is undefined behavior, this could be a problem. A classic example is gcc's aggressive signed integer overflow branch eliminator. If the standard dictates something is undefined behavior, the compiler is free to consider that branch unreachable. If the null dereference is not behind a branch in a function, the compiler is free to declare all code that calls that function to be unreachable, and recurse.

And it would be free to do this in not the current, but the next version of your compiler.

Writing code that is standards-valid is not just about writing code that compiles today cleanly. While the degree to which dereferencing and not using a null pointer is defined is currently ambiguous, relying on something that is only ambiguously defined is risky.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524