8
struct base {
    virtual void vcall() = 0;
};

struct foo final : base {
    void vcall() final;
};

void call_base(base& b) {
    b.vcall();
}

void call_foo(foo& f) {
    call_base(f);
}

void call_foo_directly(foo& f) {
    f.vcall();
}

clang 16 produces:

call_base(base&):
        mov     rax, qword ptr [rdi]
        jmp     qword ptr [rax]
call_foo(foo&):
        mov     rax, qword ptr [rdi]
        jmp     qword ptr [rax]
call_foo_directly(foo&):
        jmp     foo::vcall()@PLT

GCC and MSVC produce the same result, so it's not a problem limited to clang. Shouldn't it be possible for call_foo to contain a non-virtual call to foo::vcall() too? Is this a missed optimization, or is it possible for the call to be virtual?

See live example on Compiler Explorer.

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
  • It's probably not worth the effort. Typically you don't know the dynamic type when dealing with polymorphism. Note that while `foo` might be final, other thing can derive from `base` – NathanOliver Jun 19 '23 at 21:08
  • 1
    @NathanOliver-IsonStrike wouldn't you know the dynamic type automatically when inlining `call_base` into `call_foo`? The compiler is clearly able to make this optimization locally. – Jan Schultke Jun 19 '23 at 21:09
  • 1
    Most likely a phase-ordering issue that would make this harder than it seems -- the compiler(s) are probably doing the virtual-to-direct optimization before inlining, since that order would then allow them to inline the now direct calls. Probably the payoff from that is better than the benefit you could get from this. – Chris Dodd Jun 19 '23 at 21:30
  • 2
    Considering clang doesn't devirtualize `base& b = f; b.vcall()` but does devirtualize `((base&) f).vcall()`, seems like a missed optimization. – Artyer Jun 19 '23 at 21:43
  • 2
    Devirtualization after inlining seems to be known as a commonly missed optimization: https://lists.llvm.org/pipermail/llvm-dev/2019-May/132222.html, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91771, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89924 – user17732522 Jun 20 '23 at 00:08

1 Answers1

-1

The compiler does try, but there needs to be something to inline, if a function has no implementation it's just a empty call and that's what gets compiled; adding final just prevents the use of override later. To compile it with optimization volatile is kinda required so everything isn't optimized away.

Run this in bodbolt.

struct base {
    volatile int num = 111;
    virtual void vcall() = 0;
};

struct foo final : base {
    void vcall() {
        num += 222;
    };
};

void call_base(base& b) {
    b.vcall();
}
void call_foo(foo& f) {
    call_base(f);
}

void call_foo_directly(foo& f) {
    f.vcall();
}

void main_func(void) {
    foo val;
    call_foo(val);
    call_foo_directly(val);
}

This is the clang-15 with -O3 partial disassembly (same with -O2); vs couldn't inline call_foo.

main_func():                          # @main_func()
        mov     dword ptr [rsp - 8], 111
        add     dword ptr [rsp - 8], 222
        add     dword ptr [rsp - 8], 222
        ret
SrPanda
  • 854
  • 1
  • 5
  • 9
  • 1
    You can inspect optimizations by not defining functions, so the `+= 222` volatile stuff isn't really necessary here. You can tell a virtual call apart from a non-virtual call whether it's a direct or indirect call. When declaring an object in `main` and making virtual calls through it, the compiler does optimize it, but that's not the scenario I'm curious about. `final` means that we 100% know that when calling through the derived class, we can make direct calls instead of virtual calls, because there cannot be any overriding function. – Jan Schultke Jun 20 '23 at 20:41
  • ???, this isn't java, if you remove the `volatile` clang will compile `main_func` to just `ret` and there is no `main` in the example, just because the function has "main" in the name doesn't make it the entry point; you should check out what [final](https://en.cppreference.com/w/cpp/language/final) means in c++. – SrPanda Jun 20 '23 at 21:26
  • 2
    I didn't say that this is Java. The compiler can't optimize out anything that doesn't have a definition in the current TU. Just look at https://godbolt.org/z/G6vWP8vWf and you will see that there are two non-virtual calls in `main`. You can tell that it's non-virtual because it's a direct call instruction `call foo::vcall()@PLT`, so we are not making a call to an address fetched from the vtable. – Jan Schultke Jun 20 '23 at 21:35
  • 1
    @SrPanda Your answer misses the point of the question. The compiler ought to devirtualize the call even if the definition of the functions aren't available in the current TU. Giving it any definition, whether with volatile access or without, masks whether the optimization that OP is interested in is applied. Inlining the devirtualized call is not the goal. – user17732522 Jun 20 '23 at 21:45
  • @JanSchultke can you explain to me what do you meant by *when inlining*? – SrPanda Jun 20 '23 at 21:49
  • `call_base` is obviously being inlined into `call_foo` judging by the assembly, but no devirtualization takes place even though it should be possible. – Jan Schultke Jun 20 '23 at 21:51
  • I'm not sure about that, how can the compiler optimize something that is not implemented, with an empty like [here](https://godbolt.org/z/rraKqGKao) most is removed; it's really odd to me that compiler is going to remove chunks of code without making sure that it is actually unnecessary. – SrPanda Jun 20 '23 at 22:08
  • 1
    @SrPanda it doesn't eliminate the call, but it devirtualizes the call. *Look at the assembly!* A virtual call consists of a `mov` from the vtable and an indirect call; a non-virtual call is simply a `call` instruction with a constant address. This question is not about optimizing functions away through inlining, it's about devirtualizing function calls (although devirtualizing is a pre-requisite for eliminating calls to empty functions, so the two are related a little bit). – Jan Schultke Jun 20 '23 at 23:17
  • I don't think that i can give you the solution you are locking for so I'll leave it here; as a final note, if you only care about the "extra" calls in the assembly, just make [everything inline](https://godbolt.org/z/sfEqfq8Gs). – SrPanda Jun 20 '23 at 23:55
  • @JanSchultke: Wording nitpick, the `mov rax, [rdi] ` is loading a *pointer to* the vtable, not *from* the vtable itself. Loading a function pointer from the vtable (into RIP) happens via the memory operand for the memory-indirect `jmp qword ptr [rax]`. On RISC ISAs like AArch64, there'd be two loads and a jump with a register operand. But yes, fully inlining is something compilers can do in this case where they fail to devirtualize. :/ It's somewhat interesting that the same caller does definitely see through the virtualization when inlining is possible. – Peter Cordes Jun 20 '23 at 23:56
  • @SrPanda: Yes, of course if you have small functions that are visible at compile time, the best thing is for them to be fully inlined as well as devirtualized. We don't expect that there is a workaround to get compilers to do this missed optimization, that's why the question is phrased as "why don't they?", asking what about compiler internals makes it hard for them to do this when they can inline in the same case. We know they devirtualize in some cases. It's not the `call` that we're trying to avoid, it's the extra load and the fact that it's *indirect* (`call r/m64` vs `call rel32`) – Peter Cordes Jun 20 '23 at 23:59