How to achieve total devirtualization of custom pmr allocators?

Question

Compare the following compiling draft for a project that uses C++ polymorphic memory sources. To see what's going on I overlayed a std::pmr::monotonic_buffer_resource with my LoggingResource:

LiveDemo

#include <vector>
#include <memory_resource>
#include <array>
#include <cstdio>

struct pmr_aware_container
{
    using allocator_type = std::pmr::polymorphic_allocator<std::byte>;

    /* ctors */

    // default
    pmr_aware_container() : pmr_aware_container{allocator_type{}} {} // delegate to aa constructor

    explicit pmr_aware_container(const allocator_type alloc)
        : str_("Hello long string!!!", alloc) {
        printf("default constructor called!\n");
    }

    // copy
    // pmr_aware_container(const pmr_aware_container&) = default;

    pmr_aware_container(const pmr_aware_container& other, allocator_type alloc = {}) 
        : str_(other.str_, alloc) {
        printf("Copy constructor called!\n");
    }

    // move
    pmr_aware_container(pmr_aware_container&& other) noexcept
        : str_{std::move(other.str_), other.get_allocator() }
    {
        printf("Noexcept move constructor called!\n");
    }

    pmr_aware_container(pmr_aware_container&& other, const allocator_type& alloc)
        : str_(std::move(other.str_), alloc)
    {
        printf("Specific move constructor called!\n");
    }

    // assignement

    pmr_aware_container& operator=(const pmr_aware_container& rhs) = default;
    pmr_aware_container& operator=(pmr_aware_container&& rhs) = default;

    ~pmr_aware_container() = default;

    allocator_type get_allocator() const {
        return str_.get_allocator();
    }

    std::pmr::string str_ = "Hello long string!!!";
};


class LoggingResource : public std::pmr::memory_resource
{
public:
    LoggingResource(std::pmr::memory_resource *underlying_resource) : underlying_resource_{underlying_resource} { }

private:
    void *do_allocate(size_t bytes, size_t align) override {
        printf("Allocating %d bytes!\n", bytes);
        return underlying_resource_->allocate(bytes, align);
    }

    void do_deallocate(void*p, size_t bytes, size_t align) {
        underlying_resource_->deallocate(p, bytes, align);
    }

    bool do_is_equal(std::pmr::memory_resource const& other) const noexcept override {
        return underlying_resource_->is_equal(other);
    }

    std::pmr::memory_resource* underlying_resource_;
};

int main()
{
    std::array<std::byte, 2024> buf;
    std::pmr::monotonic_buffer_resource mbs{buf.data(), buf.size()};

    LoggingResource log_resource{&mbs};

    std::pmr::vector<pmr_aware_container> v{ { pmr_aware_container{ &log_resource}, pmr_aware_container{ &log_resource} }, &log_resource};
}

After checking the binary I was quite surprised to find out that the virtual calls made by my LoggingResource were not actually devirtualized, even with the newest compiler gcc 12.1. I thought that this would happen as a requirement for pmr to be as fast as advertised. This is the respective vtable in the assembly:

vtable for LoggingResource:
        .quad   0
        .quad   typeinfo for LoggingResource
        .quad   LoggingResource::~LoggingResource() [complete object destructor]
        .quad   LoggingResource::~LoggingResource() [deleting destructor]
        .quad   LoggingResource::do_allocate(unsigned long, unsigned long)
        .quad   LoggingResource::do_deallocate(void*, unsigned long, unsigned long)
        .quad   LoggingResource::do_is_equal(std::pmr::memory_resource const&) const

I could see that there was an improvement from gcc 9.1 to 12.1 in that back then not even the monotonic buffer resource was devirtualized. Oputput for gcc 9.1:

vtable for std::pmr::monotonic_buffer_resource:
        .quad   0
        .quad   typeinfo for std::pmr::monotonic_buffer_resource
        .quad   std::pmr::monotonic_buffer_resource::~monotonic_buffer_resource() [complete object destructor]
        .quad   std::pmr::monotonic_buffer_resource::~monotonic_buffer_resource() [deleting destructor]
        .quad   std::pmr::monotonic_buffer_resource::do_allocate(unsigned long, unsigned long)
        .quad   std::pmr::monotonic_buffer_resource::do_deallocate(void*, unsigned long, unsigned long)
        .quad   std::pmr::monotonic_buffer_resource::do_is_equal(std::pmr::memory_resource const&) const
vtable for LoggingResource:
        .quad   0
        .quad   typeinfo for LoggingResource
        .quad   LoggingResource::~LoggingResource() [complete object destructor]
        .quad   LoggingResource::~LoggingResource() [deleting destructor]
        .quad   LoggingResource::do_allocate(unsigned long, unsigned long)
        .quad   LoggingResource::do_deallocate(void*, unsigned long, unsigned long)
        .quad   LoggingResource::do_is_equal(std::pmr::memory_resource const&) const

This could prove performance critical if I try to nest different memory allocators, like a pool allocator upon a monotonic buffer resource etc.. However, what made me curious is that std::monotonic_buffer_resource did actually devirtualize. How is this possible and how can I achieve the same for my allocators?

The presence of vtable may not be a bad thing. What function is actually called at the call site? I see `call LoggingResource::do_allocate(unsigned long, unsigned long)` in godbolt — Osyotr, Aug 05 '22 at 20:09
@Osyotr that is indeed what is called. Interestingly, do_deallocate is fleshed out in assembly but never called (even though it would be a no-op for the monotonic buffer) — glades, Aug 05 '22 at 20:39

How to achieve total devirtualization of custom pmr allocators?

0 Answers0

Linked