4

Let's say there is such an abstract class:

class Base {
   public:

   virtual void f() = 0;

   virtual ~Base()  = default;
};

And some function:

void function (Base& x, bool flag1, bool flag2, bool flag3) {

   if(flag1)
      x.f();
   if(flag2)
      x.f();
   if(flag3)
      x.f();
}

In the main() function I load an instance of derived from this class from the shared library:

int main() {
  Base* x = /* load from shared lib*/;

  bool flag1 = getchar() == '1';
  bool flag2 = getchar() == '2';
  bool flag3 = getchar() == '3';

  function(*x, flag1, flag2, flag3);

  return 0;
}

Question: can I expect that within one call to the function void function (Base& x, bool flag1, bool flag2, bool flag3) the virtual function table will only be accessed once, even if all three flags are true? That is, can the compiler find the function in the table only once and use its address the other two times?

P.s. Loading an instance from a shared library is just an example, to rule out the possibility of inlining a function.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 6
    "_...Can the compiler..."_ for this simple example and under the [The as-if rule](https://en.cppreference.com/w/cpp/language/as_if) maybe/probably. The only way to make sure with more complex code would be to inspect the generated optimized assembly code. If this is a performance type question be aware of premature optimization, and make sure you profile the code with realistic data before changing anything. – Richard Critten Mar 23 '23 at 14:24
  • why do you care about what the compiler does? – Raildex Mar 23 '23 at 14:36
  • 3
    I think in theory this is an optimization the compiler is allowed to do, but practically they probably won't because the first call to `f` has access to all of the object's memory and might manipulate the vtable pointer (even if that would be UB per standard). Virtual functions aren't very optimization-friendly in general. – user17732522 Mar 23 '23 at 14:40
  • 1
    I'm betting a six pack of beer, that this is µ-optimization that on any real system will have close to zero measurable effect. Since for a given object the vtable will not "wildly" jump around, and by definition the offset for the entry is also constant, the address loads for the indirect jumps will in close proximity load the same value, through the same registers from the same location. This will either be caught by L1 cache or by short cutting the speculative prefetch. – datenwolf Mar 23 '23 at 15:08
  • 1
    It isn't obvious to the compiler that calling x.f() doesn't write to memory in a way that changes which function f is called the next time, even if you just write `x.f();x.f();`. – Marc Glisse Mar 23 '23 at 15:12
  • @MarcGlisse: Is that *defined* behavior? AFAIK the C++ standard leaves it as an implementation detail, how exactly virtual functions are done. As I see it, the such modifications to the vtable by code that's not part of the language implementation are not defined by the standard, as such UB and hence fair game for the compiler for optimization. – datenwolf Mar 23 '23 at 15:19
  • 2
    @datenwolf: To be fair, reloading the vtable pointer and then deref of it is a chain of two dependent loads. On an in-order CPU, that could lead to a small stall compared to holding the final function pointer in a register, with L1d cache load-use latency being more than 1 cycle. https://godbolt.org/z/MYcshjdhf shows GCC and clang for AArch64 and RISC-V. (And an attempt to do something with [Print address of virtual member function](https://stackoverflow.com/q/3068144) to encourage GCC to find the address once, but GCC doesn't allow casting a `void*` back to a pointer-to-member-function.) – Peter Cordes Mar 23 '23 at 15:34

2 Answers2

5

Even doing this:

void function (Base& x, bool flag1, bool flag2, bool flag3) {
   if(flag1 || flag2 || flag3) {
      if(flag1)
         x.f();
      if(flag2)
         x.f();
      if(flag3)
         x.f();
   }
}

with GCC -O3 loads the vtable pointer (mov rax, QWORD PTR [rbx]) for every call:

function(Base&, bool, bool, bool):
        push    r12
        mov     r12d, ecx
        push    rbp
        mov     ebp, edx
        push    rbx
        mov     rbx, rdi
        test    sil, sil
        jne     .L2
        test    dl, dl
        jne     .L2
.L6:
        test    r12b, r12b
        jne     .L12
.L9:
        pop     rbx
        pop     rbp
        pop     r12
        ret
.L2:
        mov     rax, QWORD PTR [rbx]
        mov     rdx, QWORD PTR [rax]
        test    sil, sil
        je      .L7
        mov     rdi, rbx
        call    rdx
        test    bpl, bpl
        je      .L6
        mov     rax, QWORD PTR [rbx]
.L7:
        mov     rdi, rbx
        call    [QWORD PTR [rax]]
        test    r12b, r12b
        je      .L9
.L12:
        mov     rax, QWORD PTR [rbx]
        mov     rdi, rbx
        pop     rbx
        pop     rbp
        pop     r12
        mov     rax, QWORD PTR [rax]
        jmp     rax
Solomon Ucko
  • 5,724
  • 3
  • 24
  • 45
2

IN C and C++ there is AS IF rule, so compiler can do optimization you are expecting (if they are enabled). Problem is that you do not have warranty compiler will do it. TO see if it happens you have to check assembly output which is not nice.

Here you can see that all compilers are using call [QWORD PTR [rax]] (more or less), so this was not optimized.

Marek R
  • 32,568
  • 6
  • 55
  • 140