1

I'm trying to understand how casting between base & derived types exactly works in C++. So I wrote a small proof-of-concept program

class Foo {
public:
  Foo() {}
};

// Bar is a subclass of Foo
class Bar : public Foo {
public:
  Bar() : Foo() {}
  void bar() { std::cout << "bar" << std::endl; }
  void bar2() { std::cout << "bar with " << i << std::endl; }

private:
  int i = 0;
};

where Foo is the base and Bar is derived from Foo.

Currently, my understandings of casting are:

  • Cast is a runtime thing. Compiler can do us a favor by checking them during compilation, but the actual type conversion occurs during runtime
  • Upcast (e.g. Foo f = Bar()), either explicit or implicit, should be always fine
  • Downcast (e.g. Bar b = Foo()) is prohibited in C++, although we can enforce the cast by using static_cast

I write 3 different programs to verify my understandings. Each program is compiled using

g++ -std=c++17 -Wall -Wextra -pedantic

Case #1

int main() {
  Foo f;
  Bar &b = static_cast<Bar &>(f);
  return 0;
}

Code compiles successfully. Running is program will not result in any error

My thoughts: ok, although the actual casting is not right as we are treating a instance of Foo as Bar at runtime, we are not seeing any error because we don't really operate on b

Case #2

int main() {
  Foo f;
  Bar &b = static_cast<Bar &>(f);
  b.bar();
  return 0;
}

Code compiles successfully. Running this program will not result in any error, and "bar" is printed

I start to be confused here: why this program ever works and "bar" gets printed? Although here we are treating a Foo as Bar, the underlying instance is still a Foo and it has no method named "bar" defined on it. How could this code works?

Case #3

int main() {
  Foo f;
  Bar &b = static_cast<Bar &>(f);
  b.bar2();
  return 0;
}

Code compiles successfully. Running this program will not result in any error, and "bar with 1981882368" (some random number) is printed

I'm even more confused here: if we think in terms of memory layout, the underlying Foo instance has no space reserved for member i which is defined in Bar. How could this code still works?

Please help me understand the programs above! Thanks in advance!

torez233
  • 193
  • 8
  • 11
    Your `static_cast` is telling the compiler "shut up about type safety, I know what I'm doing and `f` is a definitely a `Bar`". But `f` is not a `Bar` so you lied to your compiled. Your code is just broken, any behavior you observe is meaningless. Remember that in C++ you are not guaranteed to get an error when you do something wrong. And casting is a good way to reduce the compiler's ability to tell you about problems. In C++ you often just get Undefined Behavior, where the result could be anything and can't be relied on. – François Andrieux Dec 22 '22 at 20:34
  • 5
    Neither `Foo f = Bar();` nor `Bar b = Foo();` has a cast. Both have **conversions**. A cast is something you write in your source code to tell the compiler to do a conversion. So, a cast cannot be "either explicit or implicit"; it's always explicit. A **conversion** can be explicit (e.g., when written with a cast) or implicit. – Pete Becker Dec 22 '22 at 20:35
  • 2
    in this case you are guaranteed an error/warning -- if you don't suppress it with a cast! – M.M Dec 22 '22 at 20:35
  • 2
    *Code compiles successfully. Running this program will not result in any error* -- And so does [this program](https://godbolt.org/z/6E7xav9ro). You do something crazy, there is no guarantee what the results will be. – PaulMcKenzie Dec 22 '22 at 20:37
  • 1
    Except for `dynamic_cast`, casting is a compile-time thing. The compiler knows what you are converting from and what you are converting to. Not sure what you mean by "cast" - not sure what you mean by "runtime". – franji1 Dec 22 '22 at 20:43
  • 3
    *So I wrote a small proof-of-concept program* -- This is where you will get into trouble. You cannot determine if your program is valid by doing this. To know if your program is valid requires experience. Writing code to prove or disprove isn't going to get the job done. – PaulMcKenzie Dec 22 '22 at 20:45
  • 2
    Getting "no error" in *no way* guarantees correct behaviour of your program. – Jesper Juhl Dec 22 '22 at 20:47
  • 1
    @franji1, `static_cast(some_float)` is one example of a different cast that can't be a compile-time thing because it needs to produce an `int` with the same value. The cast itself generates runtime code to do the conversion unless the compiler knows the value ahead of time and optimizes it. – chris Dec 22 '22 at 20:48
  • 2
    @OP -- Looking at your profile, you don't have a lot of experience with C++. The difference between all of those other languages and C++ is that you have something called "undefined behavior", something that doesn't exist in most other languages. This is why writing proof-of-concept programs to determine how C++ will behave will not work, unlike those other languages that have a set of way of doing things. For example, you go out-of-bounds of an array in Java, you get an exception thrown -- in C++, you don't know what will happen. – PaulMcKenzie Dec 22 '22 at 21:00
  • @PaulMcKenzie thanks for all the insight, you are absolutely right: I used to mainly program in Java and I recently pick up C++. Like you said, a confusing part in C++ to me is that if one is doing something prohibited by the language standard, the result program may not result in explicit error. When I learn something from C++, or write something in C++, I will always unconsciously expect what will happen based on the result of doing similar thing in Java. Java is pretty strict in the sense that if one is doing something prohibited by JLS, a exception is guaranteed to be thrown. Again, thanks – torez233 Dec 22 '22 at 22:09

2 Answers2

4

Cast is a runtime thing. Compiler can do us a favor by checking them during compilation, but the actual type conversion occurs during runtime

No, with exception of dynamic_cast, all casts are pure compile-time constructs. After all, there are (almost) no types at runtime in C++.

Upcast (e.g. Foo f = Bar()), either explicit or implicit, should be always fine

Yes, upcasts are safe.

Downcast (e.g. Bar b = Foo()) is prohibited in C++, although we can enforce the cast by using static_cast

No, it is not prohibited, there are just some non-trivial rules. Some casts/conversions are implicit, some must be requested explicitly.


Case 1 : This is undefined behaviour(UB) because b does not point to a real Bar object. This cast assumes, the user know what they are doing, perhaps because they have some external information about the true type of the object, although not the case here.

Case 2 : You have triggered the UB, anything can happen. In this case, the compiler likely just called bar and passed b as this pointer. That is of course incorrect but that is your problem. Since the method does not use this pointer, there is not much to break.

Case 3 : Well, now you are really digging into this UB, the compiler likely just calculated this+offsetof(Bar,i) and assumed the address points to an integer. The fact that it does not is your problem for breaking the promise of downcasting to the correct type.

Quimby
  • 17,735
  • 4
  • 35
  • 55
  • 2
    The behaviour of the cast itself is undefined (see [expr.static.cast] in the standard) – M.M Dec 22 '22 at 20:43
  • "_pure compile-time constructs_": That can be easily misunderstood to mean that there will be no instructions emitted for them or something like this. I guess you mean that it is the only one requiring non-trivial operations at runtime or something like that? – user17732522 Dec 22 '22 at 20:44
  • @user17732522 I meant that the logic/rules of the cast happen at compile-time. – Quimby Dec 22 '22 at 20:50
  • @M.M Hmm, you might be right, I lived under the impression that any `&`/`*` cast is safe as long as you do not use the result. – Quimby Dec 22 '22 at 20:51
  • 2
    @Quimby generally speaking, creating a reference is only well-defined if it refers to an object that exists (and the strict aliasing rule is satisfied too) – M.M Dec 22 '22 at 21:12
  • 1
    @Quimby Many thanks for the insight. I used to program mainly in Java and I think much of my confusion is coming from the fact that in Java if one is doing similar downcast, when program is executed there will always be a very explicit exception being thrown (e.g. ClassCastException) and program won't run to completion with a return status of 0 like it does in this case – torez233 Dec 22 '22 at 21:52
  • @torez233 Fair enough, closest to Java cast is `dynamic_cast`, but it requires at least one virtual function present in `T` because the implementation relies on vtable to deduce the types. – Quimby Dec 23 '22 at 07:24
2

I gather this question is about how this was actually able to happen, more than about what the C++ language standard says should happen.

So let's look at what happened, in an example program compiled for x64 without optimization (same as you did):

#include <iostream>

class Foo {
public:
  Foo() {}
};

// Bar is a subclass of Foo
class Bar : public Foo {
public:
  Bar() : Foo() {}
  void bar() { std::cout << "bar" << std::endl; }
  void bar2() { std::cout << "bar with " << i << std::endl; }

private:
  int i = 0;
};

int main() {
  Foo f;
  Bar &b = static_cast<Bar &>(f);
  b.bar2();
  return 0;
}

Relevant parts in assembly:

_ZN3Bar4bar2Ev:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], rdi
        mov     esi, OFFSET FLAT:.LC0
        mov     edi, OFFSET FLAT:_ZSt4cout
        call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
        mov     rdx, rax
        mov     rax, QWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rax]
        mov     esi, eax
        mov     rdi, rdx
        call    _ZNSolsEi
        mov     esi, OFFSET FLAT:_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
        mov     rdi, rax
        call    _ZNSolsEPFRSoS_E
        nop
        leave
        ret
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        lea     rax, [rbp-9]
        mov     rdi, rax
        call    _ZN3FooC1Ev
        lea     rax, [rbp-9]
        mov     QWORD PTR [rbp-8], rax
        mov     rax, QWORD PTR [rbp-8]
        mov     rdi, rax
        call    _ZN3Bar4bar2Ev
        mov     eax, 0
        leave
        ret

Important things to see here are:

  • The call to bar2 is written as call _ZN3Bar4bar2Ev. Nothing about that physically requires an instance of Bar, methods "belonging to" classes is a high-level illusion, it's not as if they're actually packaged inside there in any real sense. There is really just a function with a funny (mangled) name, and it expects a pointer to an objects of appropriate type as a hidden parameter, but you can go ahead and violate its expectations. Of course, unexpected things may happen when you do that, since bar2 is just going to forge ahead blindly, regardless of what junk it receives as its implicit this-parameter.
    By the way, things would be a bit different with a virtual call. Even they do not rely on the method name though, and also won't check whether the object that you're calling the method on actually has a sensible type. I won't go too deeply into virtual calls since they were not part of the question, you can read some other QAs such as How are virtual functions and vtable implemented?.
  • bar2 accesses the member i like this: mov eax, DWORD PTR [rax], ie it loads a 4-byte quantity from an offset of zero from whatever address that it received (whatever bar2 receives as its hidden first parameter, even if it is not an address, will be used by that mov as if it is an address). No types are involved, no member names are involved, no checks are made. Memory is accessed blindly, and whatever happens, happens.

This is all quite tame - even though various rules were broken, the "default thing" (proceeding as if nothing was wrong and letting the results be whatever "naturally" happens) happened anyway. That is somewhat common (but not universal, and not guaranteed) when compiling without optimizations. It may even happen when compiling with optimizations, but then you're more likely to see various compiler shenanigans.

harold
  • 61,398
  • 6
  • 86
  • 164
  • many thanks, based on your answer and the other answer in this post, i now realize that there is no (or almost no) such thing called "types" at runtime. Type is something offered by the compiler to facilitate people writing correct program but all such typing things dissolve during compilation. What is left to the assembler is just a bunch of low-level instructions to load/write memory/registers. It's ironic that even I know some bits of assembly (x86), I never attempt to correlate the source code with compiled assembly code when I hit this issue – torez233 Dec 22 '22 at 22:23
  • @torez233 that is indeed mostly the case. For an example of the opposite, there is RTTI and `dynamic_cast` – harold Dec 22 '22 at 22:29