1

Suppose I have something along these lines:

class Base {
public:
  Base(int value) : value_(value) {}
  int getValue() const { return value_; }
private:
  int value_;
};

class Derived : public Base {
public:
  // Derived only has non-virtual functions. No added data members.
  int getValueSquared() const { return value_ * value_; }
}

And I do the following:

Base* base = new Base(42);
Derived* derived = static_cast<Derived*>(base);
std::cout << derived->getValueSquared() << std::endl;

Strictly speaking, this is UB. Practically speaking, it works just fine.

Actual data members from Base (e.g., int value_) have to be located at the same offsets whether the object is an actual Base or an actual Derived (otherwise, good luck upcasting). And getValueSquared() isn't part of the actual memory footprint of a Derived instance so it's not like it will be "missing" or unconstructed from the in-memory Base object.

I know that UB is all the reason I need not to do this but, logically, it seems it would always work. So, why not?

I am asking because it seems like an interesting quirk to discuss...not because I plan on using it in production.

Jarod42
  • 203,559
  • 14
  • 181
  • 302
Matthew M.
  • 392
  • 2
  • 11
  • 1
    Global functions are better design than such derived class. – Eugene Dec 08 '21 at 04:31
  • Pretty much anything is a better design than this... but will it ever practically produce UB? Or is this UB only on paper? – Matthew M. Dec 08 '21 at 04:35
  • Is it UB? I’m not a language lawyer but it seems possible it isn’t in that it’s accessing the `Base` part of itself which really is there. – Ben Dec 08 '21 at 04:46
  • @Ben Agreed. But I think that just makes this UB that never actually misbehaves. I wish a language lawyer would come tell me that it isn't even UB at all... because it would be pretty useful to me right now. – Matthew M. Dec 08 '21 at 04:50
  • 5
    You should really go all-in on your design: `int x = 42; cout << reinterpret_cast(x).getValueSquared();` – paddy Dec 08 '21 at 04:52
  • 1
    It either is or isn’t UB according to the standard. It feels like the common-initial-sequence rule. – Ben Dec 08 '21 at 04:52
  • 1
    I haven't examined the example closely enough to say, but maybe [CRTP](https://stackoverflow.com/q/4173254/5987) could solve the problem without UB. – Mark Ransom Dec 08 '21 at 04:54
  • @MarkRansom I had considered CRTP but it's not what I need here. I could use placement-new. `Derived* derived = new (base) Derived(*base);` given `sizeof(Derived) == sizeof(Base)` and the required conversion constructor. – Matthew M. Dec 08 '21 at 05:36
  • This would be UB too. – Eugene Dec 08 '21 at 05:41
  • 1
    Common initial sequence only applies to (whatever PODs are called this week). – n. m. could be an AI Dec 08 '21 at 05:56
  • @Eugene Because of not destructing `base` first before overwriting it? Or because of potential alignment differences? Or? – Matthew M. Dec 08 '21 at 06:30
  • *"So, why not?"*. compilers tends to detect more and more UB, and discard those "dead" branches (as UB should not happen). – Jarod42 Dec 08 '21 at 09:20
  • @jarod42 I'm not sure I follow. If code invokes UB then the only way for a compiler to discard/avoid UB is not to compile it at all (i.e., compilation error). A compiler choosing to "discard dead branches" is, in itself, UB... – Matthew M. Dec 08 '21 at 14:48
  • `Derived* derived = new (base) Derived(*base);` is UB because it assumes that in ctor both Base and Derived objects located in the same memory are alive. Lifetime of Base should have ended before the placement `new` call. – Eugene Dec 08 '21 at 15:07
  • Your concern is that compiler would probably do the "right" thing (reinterpret memory as expected type) in practice, but I say that compilers use code to remove UB branches (as it might happens after MACRO expansion (in correct code)). So discarding the whole branch is a possible output in practice. – Jarod42 Dec 08 '21 at 15:07
  • @Eugene Okay, that's what I figured you meant. I was being a bit too terse with that code but understand ... say, `Base baseCopy(base); base->~Base(); Derived* derived = new (base) Derived(baseCopy);` – Matthew M. Dec 08 '21 at 15:50
  • @Jarod42 Okay. So, you're just saying that some compilers might handle UB, if it's detected, by discarding those lines of code and doing nothing. But doing nothing is still a Behavior. So, it would still be producing UB. – Matthew M. Dec 08 '21 at 15:56
  • I might not understand your question then... The code is UB, so pedantically anything can happens, but you seems concerned by "in practice" (for which I pinpoint a case for which it would not appear to work). – Jarod42 Dec 08 '21 at 16:25

2 Answers2

2

In practice, most compilers will convert a non-virtual member function into a static function with a hidden this parameter. As long as the function doesn't use any data members that aren't part of the base class, it will probably work.

The problem with UB is that you can't predict it. Something that worked yesterday can fail today, with no rhyme or reason behind it. The compiler is given a lot of latitude on how to interpret anything that's technically undefined, and the race to find better optimizations means that unexpected changes can happen suddenly. Murphy's law says that these changes will be most evident when you're demoing the software to your most important boss or biggest customer.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • Agreed fully on the general precautions against UB. – Matthew M. Dec 08 '21 at 04:51
  • I'll accept this as the answer. I was hoping to prompt a little discussion on what compilers would actually do with this and get some theories on how this would actually fail, in the wild, in more concrete terms. Surely, if Base and Derived were POD structs with compatible layouts, their in-memory representations would be the same. But maybe this is a bit like Heisenberg uncertainty: The in-memory representations are only guaranteed to be what they "should" be when I look at them (properly). Otherwise, the compiler is free to represent them however it wants (if at all). Anyways, thanks. – Matthew M. Dec 08 '21 at 14:34
0

It is UB indeed. Aliasing rules break:

#include <iostream>

struct Base {
    int value = 0;
};

struct Derived1 : Base {
    void inc10() { value += 10; }
};

struct Derived2 : Base {
    void inc20() { value += 20; }
};

void doit(Derived1 *d1, Derived2 *d2) {
    std::cout << (&d1->value == &d2->value) << "\n";
    d1->inc10();
    d2->inc20();
    std::cout << d1->value << " " << d2->value << "\n";
}

int main() {
    Base b;
    doit(static_cast<Derived1*>(&b), static_cast<Derived2*>(&b));
    std::cout << b.value << "\n";
}

My GCC 11.2.0, when compiled with g++ a.cpp -O2 -o a, prints

1
10 30
30

It is free to assume that Derived1* and Derived2* point to different objects, hence it optimizes away extra reading of d1->value after calling d2->inc() because the latter cannot affect the former.

Clang 13.0.0 does not exhibit such behavior, and it's fine too.

Link to Godbolt

yeputons
  • 8,478
  • 34
  • 67
  • I'm not surprised by this behavior (seen it before in other scenarios) but... aliasing in itself isn't UB. Hence the `__restrict` keyword to communicate to the compiler when aliasing _definitely_ isn't occurring. – Matthew M. Dec 08 '21 at 05:13
  • @MatthewM. I'm not sure I follow. There is no need in `__restrict`: GCC thinks it's already here and there can be no aliasing. And having two different values at the same memory location (`d1->value` and `d2->value` have the same address but different values at the end of `doit()`) seems UB enough for me. – yeputons Dec 08 '21 at 05:35
  • Yes, you have produced UB with your example... which is different from my example... is my point. I wasn't asking about deliberately aliasing two pointers to the same object. – Matthew M. Dec 08 '21 at 06:07