3

Consider the following example:

class Base {
public:
    int data_;
};

class Derived : public Base {
public:
    void fun() { ::std::cout << "Hi, I'm " << this << ::std::endl; }
};

int main() {
    Base base;
    Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!

    derived->fun(); 

    return 0;
}

Function call is obviously undefined behavior according to C++ standard. But on all available machines and compilers (VC2005/2008, gcc on RH Linux and SunOS) it works as expected (prints "Hi!"). Do anyone know configuration this code can work incorrectly on? Or may be, more complicated example with the same idea (note, that Derived shouldn't carry any additional data anyway)?

Update:

From standard 5.2.9/8:

An rvalue of type “pointer to cv1 B”, where B is a class type, can be converted to an rvalue of type “pointer to cv2 D”, where D is a class derived (clause 10) from B, if a valid standard conversion from “pointer to D” to “pointer to B” exists (4.10), cv2 is the same cvqualification as, or greater cvqualification than, cv1, and B is not a virtual base class of D. The null pointer value (4.10) is converted to the null pointer value of the destination type. If the rvalue of type “pointer to cv1 B” points to a B that is actually a subobject of an object of type D, the resulting pointer points to the enclosing object of type D. Otherwise, the result of the cast is undefined.

And one more 9.3.1 (thanks @Agent_L):

If a nonstatic member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.

Thanks, Mike.

anxieux
  • 757
  • 5
  • 14
  • What do you mean by "real undefined behaviour"? Isn't the undefined behaviour as defined in the standard real enough? – PlasmaHH Apr 23 '12 at 09:50
  • @PlasmaHH, obviously, anyone who wrote such code expect it to have some real behavior (print "Hi!") in this example. But "undefined behavior" in standard means that compiler can generate code with another behavior. Such situation I call "real undefined behavior" and I think it make sense. – anxieux Apr 23 '12 at 09:56
  • How do you define 'incorrectly' ? If the standard states that it's undefined then surely what it's doing is by definition neither correct nor incorrect. I'm not sure where you're going with this but generally, undefined behaviour scenarios are something I'd think you'd want to avoid. – Component 10 Apr 23 '12 at 10:07
  • @Component10, since this is UB, anything the compiler does is "correct" insofar as compliance with the standard is concerned. UB gives the compiler / run-time environment free reign to do anything whatsoever and still be compliant. (One of the many reasons why a programmer should want to avoid UB). – David Hammen Apr 23 '12 at 10:35
  • @anxieux: what if your compiler can detect this and starts nethack, is this real undefined behaviour too? why would you even want to care about whether undefined behaviour in a particular compiler, for some particular code, produces conincidently something that appears to have reliable behaviour? It still is UB. – PlasmaHH Apr 23 '12 at 11:11
  • @PlasmaHH Agree with your example. I should enhance my phrase: 'But "undefined behavior" in standard means that compiler can generate code with another behavior **or do something else than generating a code**.' May be it is not very friendly, but I don't want to discuss here why do I ask this question. It may cause too much flood :) – anxieux Apr 23 '12 at 11:21
  • The part of standard quoted here is not the only one relevant because it defines what value pointer "derived" will have. And the other issue is calling fun for an object that is not of Derived type. – Agent_L Apr 23 '12 at 11:23
  • @Agent_L Thanks, added your item as well. – anxieux Apr 23 '12 at 11:29

6 Answers6

9

The function fun() doesn't actually do anything that matters what the this pointer is, and as it isn't a virtual function, there's nothing special needed to look up the function. Basically, it's called like any normal (non-member) function, with a bad this pointer. It just doesn't crash, which is perfectly valid undefined behavior (if that's not a contradiction).

BoBTFish
  • 19,167
  • 3
  • 49
  • 76
  • Ok, we can add printing `data_` into the `fun()` body. – anxieux Apr 23 '12 at 09:58
  • This part is irrelevant (if not misleading) >> *"The function fun() doesn't actually do anything that matters what the this pointer is, and as it isn't a virtual function, there's nothing special needed to look up the function"* – Nawaz Apr 23 '12 at 09:59
  • Ok, how about printing `this`? – anxieux Apr 23 '12 at 10:02
  • @EJP: Because it is UB, which means it *might* work even if there is virtual function. If that is so, then that part (which I quoted) is irrelevant if not misleading. – Nawaz Apr 23 '12 at 10:03
  • @Nawaz And what was misleading? – user207421 Apr 23 '12 at 10:05
  • @EJP: Is that the explanation why it works? Are you sure it will work the next time you run it? Or conversely, are you sure it will not work if there is a virtual function? – Nawaz Apr 23 '12 at 10:07
  • @Nawaz This answer is perfect explanation to what's happening here. this is nothing more than an int, so simply printing it as int will ALWAYS work correctly. – Agent_L Apr 23 '12 at 10:15
  • @Nawaz *I'm* asking *you* why the posting is both 'irrelevant' and 'misleading'. You haven't answered. Asking me questions isn't an answer. – user207421 Apr 23 '12 at 10:16
  • @EJP: I have already answered that: *"Because it is UB, which means it might work even if there is virtual function. If that is so, then that part (which I quoted) is irrelevant if not misleading."* – Nawaz Apr 23 '12 at 10:18
  • @Nawaz It might work if the moon was blue. It *does* work in this instance because the object doesn't have to be dereferenced to get to its VFT to get the address of the function, because it isn't virtual. If there is anything 'irrelevant' or 'misleading' about that statement, I would like to know what it is. – user207421 Apr 23 '12 at 10:20
  • @EJP: That leads me to infer that it will *surely* not work if `fun` is a virtual function. Is that so? – Nawaz Apr 23 '12 at 10:23
  • @EJP: How exactly? You're saying "it works because fun is not virtual" which immediately implies that "it will not work if fun is virtual". – Nawaz Apr 23 '12 at 10:26
  • @Nawaz No, it doesn't imply that at all. Your inference is invalid. In this particular case I suspect it won't work, but the topic I am addressing is the validity of the answer we are commenting on. – user207421 Apr 23 '12 at 10:27
  • @EJP: Your reasoning doesn't make sense at all. You're thinking retrospectively. – Nawaz Apr 23 '12 at 10:30
  • @Nawaz 'Thinking retrospectively' is meaningless, and it is not my reasoning that is in question here. – user207421 Apr 23 '12 at 10:36
  • @EJP: You didn't explain why my inference is invalid. – Nawaz Apr 23 '12 at 10:38
  • @Nawaz I stated that it works *in this case* because the function `fun()` isn't virtual. You inferred that it will '*surely* not work if `fun` is a virtual function'. *Non sequitur.* You negated an existential quantifier and came up with a universal quantifier. Fail. – user207421 Apr 23 '12 at 10:41
  • @EJP: If it surely works *because* fun isn't virtual, then it does imply that it surely will not work if fun is virtual. – Nawaz Apr 23 '12 at 10:43
  • @Nawaz: stricly speaking it implies that it will not surely work (if fun is virtual). Words order matters. As to BoBTFish's answer, it perfectly explains why real world compiler _usually_ doesn't cause a crash. Although the compiler has all the rigts to format a hard drive, of course. – user396672 Apr 23 '12 at 11:01
5

The comments to the code are incorrect.

Derived *derived = static_cast<Derived*>(&base);
derived->fun(); // Undefined behavior!

Corrected version:

Derived *derived = static_cast<Derived*>(&base);  // Undefined behavior!
derived->fun(); // Uses result of undefined behavior

The undefined behavior starts with the static_cast. Any subsequent use of this ill-begotten pointer is also undefined behavior. Undefined behavior is a get out of jail free card for compiler vendors. Almost any response by the compiler is compliant with the standard.

There's nothing to stop the compiler from rejecting your cast. A nice compiler might well issue a fatal compilation error for that static_cast. The violation is easy to see in this case. In general it is not easy to see, so most compilers don't bother checking.

Most compilers instead take the easiest way out. In this case, the easy way out is to simply pretend that that pointer to an instance of class Base is a pointer to an instance of class Derived. Since your function Derived::fun() is rather benign, the easy way out in this case yields a rather benign result.

Just because you are getting a nice benign result does not mean everything is cool. It is still undefined behavior. The best bet is to never rely on undefined behavior.

David Hammen
  • 32,454
  • 9
  • 60
  • 108
  • +1. This is great post. Now it shows precisely what I've been saying in the comments that whether `fun` is virtual or not, is irrelevant. – Nawaz Apr 23 '12 at 11:41
  • 1
    Agree with everything, but you have not given me example of this **nice compiler** (and/or more complicated `fun` code). – anxieux Apr 23 '12 at 11:52
  • Fixed comments in original topic. – anxieux Apr 23 '12 at 11:54
3

Run the same code infinite number of times on the same machine, maybe you will see it working incorrectly and unexpectedly if you're lucky.

The thing to understand is that undefined behavior (UB) does not mean that it will definitely not run as expected; it might run as expected, 1 time, 2 times, 10 times, even infinite number of times. UB simply means it is just not guaranteed to run as expected.

Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • 2
    Problem here, is that in real life compiler generate very concrete binary code. And it will always work correctly. – anxieux Apr 23 '12 at 10:01
  • 2
    You can see this from disassembly. (Of course you need to do it on each platform separately.) – anxieux Apr 23 '12 at 10:05
  • @anxieux: You don't really understand what UB means, do you? – Nawaz Apr 23 '12 at 10:09
  • @anxieux it is undefined. You got LUCKY that the generated binary and your code match. There are many situations where UB is used IRL because its outcome is predictable due to different circumstances. But it does not mean that the next compiler you use will provide the same outcome (which would then be a defined behaviour). – RedX Apr 23 '12 at 10:10
  • This particular code will run perfectly EVERY the time. Nawaz, I believe your understanding of UB is flawed. – Agent_L Apr 23 '12 at 10:12
  • @Agent_L: Explain UB and explain also why *"This particular code will rune perfectly EVERY the time"*. – Nawaz Apr 23 '12 at 10:12
  • @Nawaz, do you agree that behavior of any binary code is well defined? It may depend on many external parameters but it will be executed by processor in very defined manner. – anxieux Apr 23 '12 at 10:17
  • @RedX, agree. What I wont is to find this compiler which will generate me broken binary :) – anxieux Apr 23 '12 at 10:18
  • @anxieux: To prove that you have to show me a binary code whose behaviour is not well-defined. – Nawaz Apr 23 '12 at 10:19
  • @Nawaz Every time means than once compiled, you can keep running code from the question until the end of the Universe, and it will keep printing correct value of this. There is nothing here that could crash. – Agent_L Apr 23 '12 at 10:20
  • @anxieux: then what does *undefined* means? – Nawaz Apr 23 '12 at 10:58
  • @Nawaz The binary code itself is undefined. Compiler may generate any. But once generated, the behavior of this code is well-defined. – anxieux Apr 23 '12 at 11:04
  • @anxieux: which means no program can have undefined behavior at runtime because at runtime what actually runs is the generated binary which is well-defined (according to you). Is that what you want to say? – Nawaz Apr 23 '12 at 11:06
  • @Nawaz, yes and no :) Each binary code is well defined in the following sense: you can look at disassembly and understand what this code do (w/o any UB you have in C/C++). But this binary code may depend on many external parameters, which may differ from run to run. But if, from binary code, you can figure out that it doesn't depend on any external parameters, you may be sure that this binary code doesn't have any real-time UB. – anxieux Apr 23 '12 at 11:15
  • @anxieux: then post some binary code as an example to show this : *"But this binary code may depend on many external parameters, which may differ from run to run"*. Basically I want to know : what do you mean by "external parameters"? What kind of machine code express such parameters? – Nawaz Apr 23 '12 at 11:32
  • @Nawaz Any code which depends on user input or on disk data. More complicated example -- undefined variable. It's value depends on what was stored in memory before. – anxieux Apr 23 '12 at 11:37
  • @anxieux: Interesting. What kind of user-input or disk data? Please show me some code. I don't want to understand this so vaguely. – Nawaz Apr 23 '12 at 11:39
  • Just googled quite simple one: [Assembly INT 13h - read disk](http://stackoverflow.com/questions/1989589/assembly-int-13h-read-disk-problem). There is correct variant in comments. – anxieux Apr 23 '12 at 11:56
  • @anxieux: What is there in that post? Which specific set of instructions (in the code posted in the linked topic) you think invokes UB? – Nawaz Apr 23 '12 at 11:59
  • @Nawaz You didn't ask for UB in binary code. Just some code which depends on external parameters. You can use this on-disk-data as a pointer to some memory. And depending on its value you will get seg fault (or not). – anxieux Apr 23 '12 at 12:04
  • @anxieux: I asked for UB. You came up with external parameters as response to that. – Nawaz Apr 23 '12 at 12:14
  • @Nawaz Sorry, misunderstood you. But I don't understand why do you want it? What exactly do you want to prove? – anxieux Apr 23 '12 at 12:32
  • Please move extended discussions to [chat] – Tim Post Apr 23 '12 at 12:51
  • @Nawaz Ok, here is an example: [ASM code](http://pastebin.com/F6t4hAAh). If you compile it (under the Linux nasm) and launch binary w/o command line arguments it will work OK. But if you give it any args Segmentation fault will (most probably) appear. – anxieux Apr 23 '12 at 13:39
1

You have to understand what your code is doing, then you can see it's doing nothing wrong. "this" is a hidden pointer, generated for you by the compiler.

class Base
{
public:
    int data_;
};

class Derived : public Base
{

};


void fun(Derived* pThis) 
{
::std::cout << "Hi, I'm " << pThis << ::std::endl; 
}

//because you're JUST getting numerical value of a pointer, it can be same as:
void fun(void* pThis) 
{
    ::std::cout << "Hi, I'm " << pThis << ::std::endl; 
}

//but hey, even this is still same:
void fun(unsigned int pThis) 
{
    ::std::cout << "Hi, I'm " << pThis << ::std::endl; 
}

Now it's obvious: this function cannot fail. You can even pass NULL, or some other, completely unrelated class. The behaviour is undefined, but there is nothing that can go wrong here.

//Edit: ok, according to Standard, the situations are not equal. ((Derived*)NULL)->fun(); is explicitly declared UB. However, this behaviour is usually defined in compiler docs about calling conventions. I should have written "For all compilers that I know, nothing can go wrong."

Agent_L
  • 4,960
  • 28
  • 30
  • -1 for "The behaviour is undefined, but there is nothing that can go wrong here." The behavior is undefined, so there are all kinds of things that can go wrong here. The compiler is free to interpret the call to `derived->fun();` as if the programmer had really meant to call `erase_my_hard_drive()`. – David Hammen Apr 23 '12 at 10:43
  • @DavidHammen After re-reading what I've wrote, now I doubt it's even UB. Can you please point us to the specs? – Agent_L Apr 23 '12 at 10:51
  • @Agent_L I've added citation from the standard to the topic. – anxieux Apr 23 '12 at 11:05
  • ok, got it. But the more relevant part is 9.3.1 (c++ 2003): "If a nonstatic member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined." – Agent_L Apr 23 '12 at 11:13
1

For example, the compiler may optimize the code out. Consider sligthly different program:

if(some_very_complex_condition)
{
  // here is your original snippet:

  Base base;
  Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!

  derived->fun(); 
}

The compiler can

(1) detect the undefined behaviour

(2) assume that the program shouldn't expose undefined behavior

Therefore (the compiler decides that) _some_very_complex_condition_ should be always false. Assuming this, the compiler may eliminate the whole code as not reachable.

[edit] A real world example how the compiler may eliminate code which "serves" UB case:

Why does integer overflow on x86 with GCC cause an infinite loop?

Community
  • 1
  • 1
user396672
  • 3,106
  • 1
  • 21
  • 31
  • @Matthieu, Can you adopt this example to my UB? I can't. – anxieux Apr 23 '12 at 14:26
  • @anxieux: I don't know of any compiler that will detect this kind of undefined behavior, so no. Indeed, if they did detect it I'd hope they would warn!! – Matthieu M. Apr 23 '12 at 14:31
1

The practical reason why this code often works is that anything which breaks this tends to be optimized out in release/optimized-for-performance builds. However, any compiler setting that focuses on finding errors (such as debug builds) is more likely to trip on this.

In those cases, your assumption ("note, that Derived shouldn't carry any additional data anyway") doesn't hold. It definitely should, to facilitate debugging.

A slightly more complicated example is even trickier:

class Base {
public:
    int data_;
    virtual void bar() { std::cout << "Base\n"; }
};

class Derived : public Base {
public:
    void fun() { ::std::cout << "Hi, I'm " << this << ::std::endl; }
    virtual void bar() { std::cout << "Derived\n"; }
};

int main() {
    Base base;
    Derived *derived = static_cast<Derived*>(&base); // Undefined behavior!

    derived->fun(); 
    derived->bar();
}

Now a reasonable compiler may decide to skip the vtable and statically call Base::bar() since that's the object you're calling bar() on. Or it may decide that derived must point to a real Derived since you called fun on it, skip the vtable, and call Derived::bar(). As you see, both optimizations are quite reasonable given the circumstances.

And in this we see why Undefined Behavior can be so surprising: compilers can make incorrect assumptions following code with UB, even if the statement itself is compiled right.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • I haven't seen any problems with debug configurations as well. – anxieux Apr 23 '12 at 14:43
  • It works because the vtable pointers land in the same place. Memory layout of Base and Derived is identical, so they're "unsafely compatible". If derived is virtual but Base not then it will crash, but break the "Derived shouldn't carry any additional data" requirement. – Agent_L Apr 25 '12 at 10:38
  • @Agent_L: That assumes the compiler uses the vtable. As I noted, a smart compiler skips the vtable access since it's unnecessary. – MSalters Apr 25 '12 at 16:01
  • @MSalters - we're trying to crash the example, so I tried without optimizations. Surprisingly MSVC uses vtable, even in Release. http://codepad.org/W2kIHHhX – Agent_L Apr 25 '12 at 16:09
  • @Agent_L: The question was how a compiler could crash, not whether a specific version of a specific compiler crashes. Obviously if your MSVC version fails to spot the redundant vtable access, then the code will use the Base vtable. – MSalters Apr 25 '12 at 16:11
  • @MSalters : "But on all available machines (...) it works (...) Do anyone know configuration this code can work incorrectly on?" I believe the question is how to make it crash. (Demonstrate the UB) – Agent_L Apr 25 '12 at 16:14