9

Ok, so I know that technically this is undefined behavior, but nonetheless, I've seen this more than once in production code. And please correct me if I'm wrong, but I've also heard that some people use this "feature" as a somewhat legitimate substitute for a lacking aspect of the current C++ standard, namely, the inability to obtain the address (well, offset really) of a member function. For example, this is out of a popular implementation of a PCRE (Perl-compatible Regular Expression) library:

#ifndef offsetof
#define offsetof(p_type,field) ((size_t)&(((p_type *)0)->field))
#endif

One can debate whether the exploitation of such a language subtlety in a case like this is valid or not, or even necessary, but I've also seen it used like this:

struct Result
{
   void stat()
   {
      if(this)
         // do something...
      else
         // do something else...
   }
};

// ...somewhere else in the code...

((Result*)0)->stat();

This works just fine! It avoids a null pointer dereference by testing for the existence of this, and it does not try to access class members in the else block. So long as these guards are in place, it's legitimate code, right? So the question remains: Is there a practical use case, where one would benefit from using such a construct? I'm especially concerned about the second case, since the first case is more of a workaround for a language limitation. Or is it?

PS. Sorry about the C-style casts, unfortunately people still prefer to type less if they can.

M.M
  • 138,810
  • 21
  • 208
  • 365
Zoli
  • 1,137
  • 7
  • 12
  • 1
    "So long as these guards are in place, it's legitimate code, right?" Absolutely not. As you said in the first sentence, the result is undefined. – James McNellis Apr 02 '10 at 22:57
  • possible duplicate of http://stackoverflow.com/questions/2511921/dereferencing-the-null-pointer – nobody Apr 02 '10 at 22:59

7 Answers7

9

The first case is not calling anything. It's taking the address. That's a defined, permitted, operation. It yields the offset in bytes from the start of the object to the specified field. This is a very, very, common practice, since offsets like this are very commonly needed. Not all objects can be created on the stack, after all.

The second case is reasonably silly. The sensible thing would be to declare that method static.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
  • I'm aware of that. But is this common practice? Why would one do this, instead of just creating an object on the stack? Is is really worth the fancy syntax? Or are we super-optimizing something that I'm missing? :) – Zoli Apr 02 '10 at 22:58
  • 4
    Why do people write bad code? Only the original author would be able to explain the motivation behind this abuse of the language. – Lars Apr 02 '10 at 23:07
  • Got a reference to where it is defined and permitted? (Not that I doubt you, but it's always nice to have a proper source) – jalf Apr 03 '10 at 00:53
  • 1
    `E1->E2` is equivalent to `(*(E1)).E2`. `*E1` dereferences the pointer `E1`. Dereferencing a null pointer results in undefined behavior. Ergo, I don't see how the `offsetof` macro in the OP consists of defined, permitted behavior. Is there something I am missing? (the examples with `E1` and `E2` are from the C++ standard). – James McNellis Apr 03 '10 at 02:39
  • To begin with, offsetof is a card-carrying member of stddef.h. Now, you will point out, that just requires the compiler to support some trick to make it work, not necessarily this one. I believe, but I'm not in a position to prove it, that the standard says that &(a->b) shall not deference a, just calculate the address of b. There sure is lots of code out there that depends on this, not just with 0 in a. – bmargulies Apr 03 '10 at 02:51
  • 2
    C99 does say concerning the `&` operator, "if the operand is the result of a unary `*` operator, neither that operator nor the `&` operator is evaluated and the result is as if both were omitted" (§6.5.3.2/3). C89, C++03, and the C++0x FCD don't have any language to that effect. In any case, while that allows for `&*a` where `a` does not point to an object, I don't think it allows for `&((*a).b)`. I don't know; I agree that given this common implementation of `offsetof`, I'm probably wrong; I just can't find where the documents say that I'm wrong. Hopefully someone else knows better. :-/ – James McNellis Apr 03 '10 at 04:23
4

I don't see any benefit of ((Result*)0)->stat(); - it is an ugly hack which will likely break sooner than later. The proper C++ approach would be using a static method Result::stat() .

offsetof() on the other hand is legal, as the offsetof() macro never actually calls a method or accesses a member, but only performs address calculations.

Lars
  • 1,377
  • 6
  • 6
4

Everybody else has done a good job of reiterating that the behavior is undefined. But lets pretend it wasn't, and that p->member is allowed to behave in a consistent manner under certain circumstances even if p isn't a valid pointer.

Your second construct would still serve almost no purpose. From a design perspective, you've probably done something wrong if a single function can do its job both with and without accessing members, and if it can then splitting the static portion of the code into a separate, static function would be much more reasonable than expecting your users to create a null pointer to operate on.

From a safety perspective, you've only protected against a small portion of the ways an invalid this pointer can be created. There's uninitialized pointers, for starters:

Result* p;
p->stat(); //Oops, 'this' is some random value

There's pointers that have been initialized, but are still invalid:

Result* p = new Result;
delete p;
p->stat(); //'this' points to "safe" memory, but the data doesn't belong to you

And even if you always initialize your pointers, and absolutely never accidentally reuse free'd memory:

struct Struct {
    int i;
    Result r;
}
int main() {
    ((Struct*)0)->r.stat(); //'this' is likely sizeof(int), not 0
}

So really, even if it weren't undefined behavior, it is worthless behavior.

Dennis Zickefoose
  • 10,791
  • 3
  • 29
  • 38
3

Although libraries targeting specific C++ implementations may do this, that doesn't mean it's "legitimate" generally.

This works just fine! It avoids a null pointer dereference by testing for the existence of this, and it does not try to access class members in the else block. So long as these guards are in place, it's legitimate code, right?

No, because although it might work fine on some C++ implementations, it is perfectly okay for it to not work on any conforming C++ implementation.

Daniel Earwicker
  • 114,894
  • 38
  • 205
  • 284
3

Dereferencing a null-pointer is undefined behavior and anything can happen if you do it. Don't do it if you want a program that works.

Just because it doesn't immediately crash in one specific test case doesn't mean that it won't get you into all kinds of trouble.

sth
  • 222,467
  • 53
  • 283
  • 367
2

Undefined behaviour is undefined behaviour. Do this tricks "work" for your particular compiler? well, possibly. will they work for the next iteration of it. or for another compiler? Possibly not. You pays your money and you takes your choice. I can only say that in nearly 25 years of C++ programming I've never felt the need to do any of these things.

  • It works on GCC 4.3.3 and VS 2005. I know it's undefined behavior, but since I've seen it a few times now, I began to wonder whether it's something that people often do, and if so, why? – Zoli Apr 02 '10 at 23:04
1

Regarding the statement:

It avoids a null pointer dereference by testing for the existence of this, and it does not try to access class members in the else block. So long as these guards are in place, it's legitimate code, right?

The code is not legitimate. There is no guarantee that the compiler and/or runtime will actually call to the method when the pointer is NULL. The checking in the method is of no help because you can't assume that the method will actually end up being called with a NULL this pointer.

nobody
  • 19,814
  • 17
  • 56
  • 77