3

Let's say we have a "master" class with a method called "Bulk" to perform N interactions over a virtual method.

This virtual method may be overridden by many classes but only once. For performance reasons we have to minimize the cost of calling/vtable resolution as much as we can. (Example: ++10Gb network packet generation)

One of my ideas to resolve this was to make the method Bulk virtual and "somehow" force it to be recompiled on each derived class, so we could make only one VTABLE search instead of N and also get some improvements from inlining/SSE/etc. However, reading de ASM what I only get is a generic "Bulk" method that again searches in the vtable N times.

¿Do you know any way to force that method recompilation (without the need to copy-paste its code on each derived class of course) or any other way to reduce the calls ad VTABLE searches? I thought similar requirements should be asked frequently but I did not found anything...

Example code to play around:

master.hpp

#pragma once
#include <string>

class master
{
public:
    virtual unsigned Bulk(unsigned n)
    {
        unsigned ret = 0;
        for (int i = 0; i < 144; ++i)
            ret += once();

        return ret;
    }

    virtual unsigned once() = 0;
};

derived1.hpp

#pragma once
#include "master.hpp"

class derived1 final: public master
{
    virtual inline unsigned once() final { return 7; }
};

derived2.hpp

#pragma once
#include "master.hpp"

class derived2 final: public master
{
    virtual inline unsigned once() final { return 5; }
};

main.cpp

#include "derived1.hpp"
#include "derived2.hpp"
#include <iostream>
using namespace std;

int main()
{
    derived1 d1;
    derived2 d2;

    cout << d1.Bulk(144) << endl;
    cout << d2.Bulk(144) << endl;

    return 0;
}

Compile cmd i'm using: g++ main.cpp -S -O3 --std=gnu++17

Compiled Bulk Loop:

    movq    0(%rbp), %rax
    movq    %rbp, %rdi
    call    *8(%rax)
    addl    %eax, %r12d
    subl    $1, %ebx
    jne .L2
Ralequi
  • 306
  • 2
  • 12
  • is `once` actually needed as `public` ? or is it only called in `Bulk` ? If it is the latter, maybe it doesn't need to virtual in the first place? – 463035818_is_not_an_ai Dec 17 '20 at 11:12
  • There is almost no performance cost for calling a virtual function (It will just read the pointer twice more). And a sane compiler will not read the pointer repeatedly when calling the virtual function repeatedly. – Sprite Dec 17 '20 at 11:15
  • Thank for your comment @largest_prime_is_463035818. There is no problem on make `once` private, however, if it is not virtual, there would not be any way to call it from `Bulk` correctly – Ralequi Dec 17 '20 at 11:24
  • Thank you for your comment @Sprite . I've edited the post and included the loop ASM code. Unless i'm wrong, I think it is checking the VTABLE each time iteration. Also, in any case the explicit "call" does not allow any further optimization – Ralequi Dec 17 '20 at 11:37

1 Answers1

3

I am not really understanding your question ;)

However, I suggest to avoid virtual dispatch when you want no virtual dispatch instead of trying to optimize around the virtual table (which is an implementation detail, hence optimizations wont be portable). Maybe CRTP is an option.

Just in case you want to use derivedX polymorphically, you can add a common base class:

#include <iostream>
#include <string>
using namespace std;

struct base {
    virtual std::string Bulk(unsigned n) = 0;
    virtual ~base(){}
};

template <typename T>
struct master : base {
    virtual std::string Bulk(unsigned n) {
        std::string ret = "";
        auto ptr = static_cast<T*>(this);
        for (int i = 0; i < n; ++i) ret += ptr->once();
        return ret;
    }
};

struct derived1 final : public master<derived1> {
    std::string once() { return "a"; }
};

struct derived2 final : public master<derived2> {
    std::string once() { return "b"; }
};

int main()
{
    derived1 d1;
    derived2 d2;

    cout << d1.Bulk(3) << endl;
    cout << d2.Bulk(3) << endl;
}
463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • 1
    I'm checking and playing with your code, but what I've see for now It produces exactly the ASM I wish for. Thank you so much! – Ralequi Dec 17 '20 at 12:02