Preserving block-allocator for boost::deque?

Question

I want to make boost::container::deque to reuse the freed blocks instead of deallocating them and then allocating new ones.

The only configuration option allowed by boost::container::deque is the compile-time specification of the block size (in terms of either items or bytes count).

Indeed, specifying the block size is something that I want to use, but I also would like to specify the number of blocks that will be preserved after becoming free and reused when new blocks will be required. However, as illustrated here, for boost::container::deque this number is 0, so it deallocates as soon as a block becomes free! I want to make a deque with this number equal to 1.

I see an opportunity to achieve this by specifying a custom allocator. Consider this ugly one:

template < typename Block >
struct PreservingAllocator : std::allocator<Block>
{
  using std::allocator<Block>::allocator;

  Block* allocate(size_t nn)
  {
    if (nn == 1) if (auto oldBlock = m_reserve->exchange(nullptr); !!oldBlock) return oldBlock;
    return std::allocator<Block>::allocate(nn);
  }

  void deallocate(Block* block, size_t nn)
  {
    if (nn == 1) block = m_reserve->exchange(block);
    if (!!block) std::allocator<Block>::deallocate(block, nn);
  }

private:
  static constexpr auto Deleter = [](std::atomic<Block*>* pointer)
  {
    if (!!pointer) if (auto block = pointer->exchange(nullptr); !!block)
      std::allocator<Block>{}.deallocate(block,1);
   delete pointer;
  };

  std::shared_ptr<std::atomic<Block*>> m_reserve = {new std::atomic<Block*>{nullptr},Deleter};
};

So the questions are.

How can I specify an allocator of blocks for boost::container::deque (blocks, not items!)?
If there is a way, then will such specification support allocators with state?
If yes, then will the above-mentioned allocator go for it?
After all, if not in this way, how can I make a deque that will not deallocate at least one of its freed blocks and will reuse it later when a new block will be needed?

If you use a pool alloctator, you will have this behaviour exactly. That's why allocators are an "extension" point for standard containers in the first place. The dequq doesn't allocate single items anyways (that's in the specification of the datastructure). — sehe, Aug 09 '20 at 12:55

score 2 · Accepted Answer · answered Aug 10 '20 at 17:07

This gave me an excuse to play more with allocators. I opted for the polymorphic allocators - although that's only tangentially related¹.

Aside: The relation is that with custom allocators you often want to propagate the allocator to nested types that are allocator-aware. See "Advanced" below

Sample Element Type

struct X {
    std::string key, value;
};

It doesn't get much simpler, although it allows us to experiment with sharing the allocator with the nested strings later.

Tracing Memory Resource

Let's create a tracing memory resource. That's pretty straight forward, and we'll just forward to standard new/delete:

namespace pmr = boost::container::pmr;

struct tracing_resource : pmr::memory_resource {
    uint64_t n = 0, total_bytes = 0;

    virtual void* do_allocate(std::size_t bytes, std::size_t alignment) override {
        n += 1;
        total_bytes += bytes;
        return pmr::new_delete_resource()->allocate(bytes, alignment);
    }

    virtual void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) override {
        if (p) {
            n -= 1;
            total_bytes -= bytes;
        }
        return pmr::new_delete_resource()->deallocate(p, bytes, alignment);
    }

    virtual bool do_is_equal(const memory_resource& other) const noexcept override {
        return pmr::new_delete_resource()->is_equal(other);
    }
};

We can inspect n (number of allocation) as well as the total bytes allocated at various points throughout our test code.

A Test Routine

Let's put it together in main, starting with our tracer:

tracing_resource tracer;

Let's mount a pooled resource on that:

pmr::unsynchronized_pool_resource res(&tracer);

auto allocations = [&] {
    fmt::print("alloc: #{}, {} bytes, cached = {}\n", tracer.n, tracer.total_bytes, cache_buckets(res));
};
allocations();

This will print

alloc: #0, 0 bytes, cached = {0, 0, 0, 0, 0, 0, 0, 0, 0}

right out of the gate.

Now, let's start (re)allocating some deques in various patterns:

pmr::deque<X> collection(&res);
auto report = [&] {
    fmt::print("collection = {}\nalloc: #{}, {} bytes, cached = {}\n", collection, tracer.n, tracer.total_bytes, cache_buckets(res));
};

std::vector data1 { X{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
std::vector data2 { X{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
std::vector data3 { X{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };

auto i = 0;
for (auto const& data : {data1, data2, data3}) {
    for (auto el : data) {
        (i%2)
            ? collection.push_back(el)
            : collection.push_front(el);
    }

    report();

    collection.clear();
    report();
}

This will append varying sequences at varying ends of the container. We won't do much mutation, as this will become interesting when we make the strings also use the pooled resource).

Live Demo

Live On Compiler Explorer

#include <boost/container/pmr/deque.hpp>
#include <boost/container/pmr/unsynchronized_pool_resource.hpp>

// debug output
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <iomanip>

namespace pmr = boost::container::pmr;

struct tracing_resource : pmr::memory_resource {
    uint64_t n = 0, total_bytes = 0;

    virtual void* do_allocate(std::size_t bytes, std::size_t alignment) override {
        n += 1;
        total_bytes += bytes;
        return pmr::new_delete_resource()->allocate(bytes, alignment);
    }

    virtual void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) override {
        if (p) {
            n -= 1;
            total_bytes -= bytes;
        }
        return pmr::new_delete_resource()->deallocate(p, bytes, alignment);
    }

    virtual bool do_is_equal(const memory_resource& other) const noexcept override {
        return pmr::new_delete_resource()->is_equal(other);
    }
};

struct X {
    std::string key, value;

    friend std::ostream& operator<<(std::ostream& os, X const& x) {
        return os << "(" << std::quoted(x.key) << ", " << std::quoted(x.value) << ")";
    }
};

auto cache_buckets(pmr::unsynchronized_pool_resource& res) {
    using namespace ::ranges;
    return views::iota(0ull)
        | views::take_exactly(res.pool_count())
        | views::transform([&](auto idx) {
            return res.pool_cached_blocks(idx);
        });
}

int main() {
    tracing_resource tracer;
    {
        pmr::unsynchronized_pool_resource res(&tracer);

        auto allocations = [&] {
            fmt::print("alloc: #{}, {} bytes, cached = {}\n", tracer.n, tracer.total_bytes, cache_buckets(res));
        };
        allocations();

        {
            pmr::deque<X> collection(&res);
            auto report = [&] {
                fmt::print("collection = {}\nalloc: #{}, {} bytes, cached = {}\n", collection, tracer.n, tracer.total_bytes, cache_buckets(res));
            };

            std::vector data1 { X{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
            std::vector data2 { X{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
            std::vector data3 { X{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };
                            
            auto i = 0;
            for (auto const& data : {data1, data2, data3}) {
                for (auto el : data) {
                    (i%2)
                        ? collection.push_back(el)
                        : collection.push_front(el);
                }

                report();

                collection.clear();
                report();
            }
        }

        allocations();
    }

    fmt::print("alloc: #{}, {} bytes\n", tracer.n, tracer.total_bytes);
}

Prints

alloc: #0, 0 bytes, cached = {0, 0, 0, 0, 0, 0, 0, 0, 0}
collection = {("3", "drei"), ("2", "zwei"), ("1", "eins")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "fuenf"), ("4", "vier")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {("9", "neun"), ("8", "acht"), ("7", "sieben")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
alloc: #4, 1864 bytes, cached = {0, 0, 1, 0, 0, 3, 0, 0, 0}
alloc: #0, 0 bytes

Advanced

As promised, we can make the element type allocator aware and show propagation:

 #include <boost/container/pmr/string.hpp>
 // ...

 struct X {
     using allocator_type = pmr::polymorphic_allocator<X>;
 
     template<typename K, typename V>
     explicit X(K&& key, V&& value, allocator_type a = {})
         : key(std::forward<K>(key), a), value(std::forward<V>(value), a) {}
 
     pmr::string key, value;
 };

Let's amend the test driver to emplace the elements from string literals:

std::vector data1 { std::pair{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
std::vector data2 { std::pair{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
std::vector data3 { std::pair{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };
                
auto i = 0;
for (auto const& data : {data1, data2, data3}) {
    for (auto [k,v] : data) {
        (i%2)
            ? collection.emplace_back(k, v)
            : collection.emplace_front(k, v);
    }

For good measure, let's also mutate one of the nested string values:

    collection.at(1).value.append(50, '*'); // thwart SSO
    report();

    collection.at(1).value = "sept";
    report();

Again, demoing Live On Compiler Explorer

#include <boost/container/pmr/deque.hpp>
#include <boost/container/pmr/string.hpp>
#include <boost/container/pmr/unsynchronized_pool_resource.hpp>

// debug output
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <iomanip>

namespace pmr = boost::container::pmr;

struct tracing_resource : pmr::memory_resource {
    uint64_t n = 0, total_bytes = 0;

    virtual void* do_allocate(std::size_t bytes, std::size_t alignment) override {
        n += 1;
        total_bytes += bytes;
        return pmr::new_delete_resource()->allocate(bytes, alignment);
    }

    virtual void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) override {
        if (p) {
            n -= 1;
            total_bytes -= bytes;
        }
        return pmr::new_delete_resource()->deallocate(p, bytes, alignment);
    }

    virtual bool do_is_equal(const memory_resource& other) const noexcept override {
        return pmr::new_delete_resource()->is_equal(other);
    }
};

struct X {
    using allocator_type = pmr::polymorphic_allocator<X>;

    template<typename K, typename V>
    explicit X(K&& key, V&& value, allocator_type a = {})
        : key(std::forward<K>(key), a), value(std::forward<V>(value), a) {}

    pmr::string key, value;

    friend std::ostream& operator<<(std::ostream& os, X const& x) {
        return os << "(" << std::quoted(x.key.c_str()) << ", " << std::quoted(x.value.c_str()) << ")";
    }
};

auto cache_buckets(pmr::unsynchronized_pool_resource& res) {
    using namespace ::ranges;
    return views::iota(0ull)
        | views::take_exactly(res.pool_count())
        | views::transform([&](auto idx) {
            return res.pool_cached_blocks(idx);
        });
}

int main() {
    tracing_resource tracer;
    {
        pmr::unsynchronized_pool_resource res(&tracer);

        auto allocations = [&] {
            fmt::print("alloc: #{}, {} bytes, cached = {}\n", tracer.n, tracer.total_bytes, cache_buckets(res));
        };
        allocations();

        {
            pmr::deque<X> collection(&res);
            auto report = [&] {
                fmt::print("collection = {}\nalloc: #{}, {} bytes, cached = {}\n", collection, tracer.n, tracer.total_bytes, cache_buckets(res));
            };

            std::vector data1 { std::pair{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
            std::vector data2 { std::pair{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
            std::vector data3 { std::pair{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };
                            
            auto i = 0;
            for (auto const& data : {data1, data2, data3}) {
                for (auto [k,v] : data) {
                    (i%2)
                        ? collection.emplace_back(k, v)
                        : collection.emplace_front(k, v);
                }

                report();

                collection.at(1).value.append(50, '*'); // thwart SSO
                report();

                collection.at(1).value = "sept";
                report();

                collection.clear();
                report();
            }
        }

        allocations();
    }

    fmt::print("alloc: #{}, {} bytes\n", tracer.n, tracer.total_bytes);
}

Prints:

alloc: #0, 0 bytes, cached = {0, 0, 0, 0, 0, 0, 0, 0, 0}
collection = {("3", "drei"), ("2", "zwei"), ("1", "eins")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {("3", "drei"), ("2", "zwei**************************************************"), ("1", "eins")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {("3", "drei"), ("2", "sept"), ("1", "eins")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "fuenf"), ("4", "vier")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "fuenf**************************************************"), ("4", "vier")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "sept"), ("4", "vier")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
collection = {("9", "neun"), ("8", "acht"), ("7", "sieben")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 1, 0, 0, 0}
collection = {("9", "neun"), ("8", "acht**************************************************"), ("7", "sieben")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {("9", "neun"), ("8", "sept"), ("7", "sieben")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
alloc: #5, 2008 bytes, cached = {0, 0, 1, 1, 0, 3, 0, 0, 0}
alloc: #0, 0 bytes

Conclusions

While I chose poymorphic allocators because they support scoped_allocator_adaptor<>-style propagation by default, all of the above can be created with statically typed allocators.

It does demonstrate that if you use a pool allocator, the deque behaviour becomes pooled.

Aside: Pool allocators exist that are able to forgo cleanup, which can be valid in some scenarios, e.g. where the entire memory pool lives on the stack anyways. This is a common allocation optimization technique, allowing large numbers of deallocations/destructions to be skipped.

Addressing some more points of your list:

Q. How can I specify an allocator of blocks for boost::container::deque (blocks, not items!)?

A. I think the allocator is intrinsically always called for blocks, because of how deques work.
Q. If there is a way, then will such specification support allocators with state?

A. Standard library as well as Boost Container should all support stateful allocators these days. When in doubt, Boost Container has your back.
Q. If yes, then will the above-mentioned allocator go for it?

A. I didn't look closely at it, but you may put it in the same test-bed to find out
Q. After all, if not in this way, how can I make a deque that will not deallocate at least one of its freed blocks and will reuse it later when a new block will be needed?

A. See above. I'm not sure I understand the exact intent of the "at least one" clause, but I did note that Boost's deque implementation does do a "private map" allocation — presumably for some block accounting overhead — that stays around until the deque object is destroyed. This allocation doesn't happen at (default) construction, but later.

Added a implementation with non-polymorphic stateful allocators on top of Boost `object_pool` instead of PMR `memory_resource`s. See [other answer](https://stackoverflow.com/a/63344808/85371) — sehe, Aug 10 '20 at 20:55

score 2 · Answer 2 · answered Aug 10 '20 at 20:44

Couldn't leave well enough alone - I wanted to prove that it can be done with stateful statically-typed allocators just the same.

This builds on

Boost Container's basic_string and deque
Boost Container's scoped_allocator_adaptor to get allocator propagation on uses_allocator-construction
Boost Pool's object_pool for a scoped pool that uses segregated storage, optional block size hinting and ordered allocations with any UserAllocator
Finally, I used my own (surprise! I forgot about that earlier) stateful allocator to complement object_pool¹. It's the allocator in the non_boost namespace.

Note that if you don't mind singleton pools, you can just use Boist's own pool allocators. This is obviously recommended because mine is not maintained by Boost.

Note: I sacrificed the pooling statistics for the sake of cleanliness (it's way messier to add these to the allocator than it is to decorate a polymorphic memory_resource), but I guess the profiler knows best in the end