3

I want to make boost::container::deque to reuse the freed blocks instead of deallocating them and then allocating new ones.

The only configuration option allowed by boost::container::deque is the compile-time specification of the block size (in terms of either items or bytes count).

Indeed, specifying the block size is something that I want to use, but I also would like to specify the number of blocks that will be preserved after becoming free and reused when new blocks will be required. However, as illustrated here, for boost::container::deque this number is 0, so it deallocates as soon as a block becomes free! I want to make a deque with this number equal to 1.

I see an opportunity to achieve this by specifying a custom allocator. Consider this ugly one:

template < typename Block >
struct PreservingAllocator : std::allocator<Block>
{
  using std::allocator<Block>::allocator;

  Block* allocate(size_t nn)
  {
    if (nn == 1) if (auto oldBlock = m_reserve->exchange(nullptr); !!oldBlock) return oldBlock;
    return std::allocator<Block>::allocate(nn);
  }

  void deallocate(Block* block, size_t nn)
  {
    if (nn == 1) block = m_reserve->exchange(block);
    if (!!block) std::allocator<Block>::deallocate(block, nn);
  }

private:
  static constexpr auto Deleter = [](std::atomic<Block*>* pointer)
  {
    if (!!pointer) if (auto block = pointer->exchange(nullptr); !!block)
      std::allocator<Block>{}.deallocate(block,1);
   delete pointer;
  };

  std::shared_ptr<std::atomic<Block*>> m_reserve = {new std::atomic<Block*>{nullptr},Deleter};
};

So the questions are.

  1. How can I specify an allocator of blocks for boost::container::deque (blocks, not items!)?
  2. If there is a way, then will such specification support allocators with state?
  3. If yes, then will the above-mentioned allocator go for it?
  4. After all, if not in this way, how can I make a deque that will not deallocate at least one of its freed blocks and will reuse it later when a new block will be needed?
Vahagn
  • 4,670
  • 9
  • 43
  • 72
  • If you use a pool alloctator, you will have this behaviour exactly. That's why allocators are an "extension" point for standard containers in the first place. The dequq doesn't allocate single items anyways (that's in the specification of the datastructure). – sehe Aug 09 '20 at 12:55

2 Answers2

2

This gave me an excuse to play more with allocators. I opted for the polymorphic allocators - although that's only tangentially related¹.

Aside: The relation is that with custom allocators you often want to propagate the allocator to nested types that are allocator-aware. See "Advanced" below

Sample Element Type

struct X {
    std::string key, value;
};

It doesn't get much simpler, although it allows us to experiment with sharing the allocator with the nested strings later.

Tracing Memory Resource

Let's create a tracing memory resource. That's pretty straight forward, and we'll just forward to standard new/delete:

namespace pmr = boost::container::pmr;

struct tracing_resource : pmr::memory_resource {
    uint64_t n = 0, total_bytes = 0;

    virtual void* do_allocate(std::size_t bytes, std::size_t alignment) override {
        n += 1;
        total_bytes += bytes;
        return pmr::new_delete_resource()->allocate(bytes, alignment);
    }

    virtual void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) override {
        if (p) {
            n -= 1;
            total_bytes -= bytes;
        }
        return pmr::new_delete_resource()->deallocate(p, bytes, alignment);
    }

    virtual bool do_is_equal(const memory_resource& other) const noexcept override {
        return pmr::new_delete_resource()->is_equal(other);
    }
};

We can inspect n (number of allocation) as well as the total bytes allocated at various points throughout our test code.

A Test Routine

Let's put it together in main, starting with our tracer:

tracing_resource tracer;

Let's mount a pooled resource on that:

pmr::unsynchronized_pool_resource res(&tracer);

auto allocations = [&] {
    fmt::print("alloc: #{}, {} bytes, cached = {}\n", tracer.n, tracer.total_bytes, cache_buckets(res));
};
allocations();

This will print

alloc: #0, 0 bytes, cached = {0, 0, 0, 0, 0, 0, 0, 0, 0}

right out of the gate.

Now, let's start (re)allocating some deques in various patterns:

pmr::deque<X> collection(&res);
auto report = [&] {
    fmt::print("collection = {}\nalloc: #{}, {} bytes, cached = {}\n", collection, tracer.n, tracer.total_bytes, cache_buckets(res));
};

std::vector data1 { X{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
std::vector data2 { X{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
std::vector data3 { X{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };

auto i = 0;
for (auto const& data : {data1, data2, data3}) {
    for (auto el : data) {
        (i%2)
            ? collection.push_back(el)
            : collection.push_front(el);
    }

    report();

    collection.clear();
    report();
}

This will append varying sequences at varying ends of the container. We won't do much mutation, as this will become interesting when we make the strings also use the pooled resource).

Live Demo

Live On Compiler Explorer

#include <boost/container/pmr/deque.hpp>
#include <boost/container/pmr/unsynchronized_pool_resource.hpp>

// debug output
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <iomanip>

namespace pmr = boost::container::pmr;

struct tracing_resource : pmr::memory_resource {
    uint64_t n = 0, total_bytes = 0;

    virtual void* do_allocate(std::size_t bytes, std::size_t alignment) override {
        n += 1;
        total_bytes += bytes;
        return pmr::new_delete_resource()->allocate(bytes, alignment);
    }

    virtual void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) override {
        if (p) {
            n -= 1;
            total_bytes -= bytes;
        }
        return pmr::new_delete_resource()->deallocate(p, bytes, alignment);
    }

    virtual bool do_is_equal(const memory_resource& other) const noexcept override {
        return pmr::new_delete_resource()->is_equal(other);
    }
};

struct X {
    std::string key, value;

    friend std::ostream& operator<<(std::ostream& os, X const& x) {
        return os << "(" << std::quoted(x.key) << ", " << std::quoted(x.value) << ")";
    }
};

auto cache_buckets(pmr::unsynchronized_pool_resource& res) {
    using namespace ::ranges;
    return views::iota(0ull)
        | views::take_exactly(res.pool_count())
        | views::transform([&](auto idx) {
            return res.pool_cached_blocks(idx);
        });
}

int main() {
    tracing_resource tracer;
    {
        pmr::unsynchronized_pool_resource res(&tracer);

        auto allocations = [&] {
            fmt::print("alloc: #{}, {} bytes, cached = {}\n", tracer.n, tracer.total_bytes, cache_buckets(res));
        };
        allocations();

        {
            pmr::deque<X> collection(&res);
            auto report = [&] {
                fmt::print("collection = {}\nalloc: #{}, {} bytes, cached = {}\n", collection, tracer.n, tracer.total_bytes, cache_buckets(res));
            };

            std::vector data1 { X{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
            std::vector data2 { X{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
            std::vector data3 { X{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };
                            
            auto i = 0;
            for (auto const& data : {data1, data2, data3}) {
                for (auto el : data) {
                    (i%2)
                        ? collection.push_back(el)
                        : collection.push_front(el);
                }

                report();

                collection.clear();
                report();
            }
        }

        allocations();
    }

    fmt::print("alloc: #{}, {} bytes\n", tracer.n, tracer.total_bytes);
}

Prints

alloc: #0, 0 bytes, cached = {0, 0, 0, 0, 0, 0, 0, 0, 0}
collection = {("3", "drei"), ("2", "zwei"), ("1", "eins")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "fuenf"), ("4", "vier")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {("9", "neun"), ("8", "acht"), ("7", "sieben")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
alloc: #4, 1864 bytes, cached = {0, 0, 1, 0, 0, 3, 0, 0, 0}
alloc: #0, 0 bytes

Advanced

As promised, we can make the element type allocator aware and show propagation:

 #include <boost/container/pmr/string.hpp>
 // ...

 struct X {
     using allocator_type = pmr::polymorphic_allocator<X>;
 
     template<typename K, typename V>
     explicit X(K&& key, V&& value, allocator_type a = {})
         : key(std::forward<K>(key), a), value(std::forward<V>(value), a) {}
 
     pmr::string key, value;
 };

Let's amend the test driver to emplace the elements from string literals:

std::vector data1 { std::pair{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
std::vector data2 { std::pair{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
std::vector data3 { std::pair{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };
                
auto i = 0;
for (auto const& data : {data1, data2, data3}) {
    for (auto [k,v] : data) {
        (i%2)
            ? collection.emplace_back(k, v)
            : collection.emplace_front(k, v);
    }

For good measure, let's also mutate one of the nested string values:

    collection.at(1).value.append(50, '*'); // thwart SSO
    report();

    collection.at(1).value = "sept";
    report();

Again, demoing Live On Compiler Explorer

#include <boost/container/pmr/deque.hpp>
#include <boost/container/pmr/string.hpp>
#include <boost/container/pmr/unsynchronized_pool_resource.hpp>

// debug output
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <iomanip>

namespace pmr = boost::container::pmr;

struct tracing_resource : pmr::memory_resource {
    uint64_t n = 0, total_bytes = 0;

    virtual void* do_allocate(std::size_t bytes, std::size_t alignment) override {
        n += 1;
        total_bytes += bytes;
        return pmr::new_delete_resource()->allocate(bytes, alignment);
    }

    virtual void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) override {
        if (p) {
            n -= 1;
            total_bytes -= bytes;
        }
        return pmr::new_delete_resource()->deallocate(p, bytes, alignment);
    }

    virtual bool do_is_equal(const memory_resource& other) const noexcept override {
        return pmr::new_delete_resource()->is_equal(other);
    }
};

struct X {
    using allocator_type = pmr::polymorphic_allocator<X>;

    template<typename K, typename V>
    explicit X(K&& key, V&& value, allocator_type a = {})
        : key(std::forward<K>(key), a), value(std::forward<V>(value), a) {}

    pmr::string key, value;

    friend std::ostream& operator<<(std::ostream& os, X const& x) {
        return os << "(" << std::quoted(x.key.c_str()) << ", " << std::quoted(x.value.c_str()) << ")";
    }
};

auto cache_buckets(pmr::unsynchronized_pool_resource& res) {
    using namespace ::ranges;
    return views::iota(0ull)
        | views::take_exactly(res.pool_count())
        | views::transform([&](auto idx) {
            return res.pool_cached_blocks(idx);
        });
}

int main() {
    tracing_resource tracer;
    {
        pmr::unsynchronized_pool_resource res(&tracer);

        auto allocations = [&] {
            fmt::print("alloc: #{}, {} bytes, cached = {}\n", tracer.n, tracer.total_bytes, cache_buckets(res));
        };
        allocations();

        {
            pmr::deque<X> collection(&res);
            auto report = [&] {
                fmt::print("collection = {}\nalloc: #{}, {} bytes, cached = {}\n", collection, tracer.n, tracer.total_bytes, cache_buckets(res));
            };

            std::vector data1 { std::pair{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
            std::vector data2 { std::pair{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
            std::vector data3 { std::pair{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };
                            
            auto i = 0;
            for (auto const& data : {data1, data2, data3}) {
                for (auto [k,v] : data) {
                    (i%2)
                        ? collection.emplace_back(k, v)
                        : collection.emplace_front(k, v);
                }

                report();

                collection.at(1).value.append(50, '*'); // thwart SSO
                report();

                collection.at(1).value = "sept";
                report();

                collection.clear();
                report();
            }
        }

        allocations();
    }

    fmt::print("alloc: #{}, {} bytes\n", tracer.n, tracer.total_bytes);
}

Prints:

alloc: #0, 0 bytes, cached = {0, 0, 0, 0, 0, 0, 0, 0, 0}
collection = {("3", "drei"), ("2", "zwei"), ("1", "eins")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {("3", "drei"), ("2", "zwei**************************************************"), ("1", "eins")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {("3", "drei"), ("2", "sept"), ("1", "eins")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "fuenf"), ("4", "vier")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "fuenf**************************************************"), ("4", "vier")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "sept"), ("4", "vier")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
collection = {("9", "neun"), ("8", "acht"), ("7", "sieben")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 1, 0, 0, 0}
collection = {("9", "neun"), ("8", "acht**************************************************"), ("7", "sieben")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {("9", "neun"), ("8", "sept"), ("7", "sieben")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
alloc: #5, 2008 bytes, cached = {0, 0, 1, 1, 0, 3, 0, 0, 0}
alloc: #0, 0 bytes

Conclusions

While I chose poymorphic allocators because they support scoped_allocator_adaptor<>-style propagation by default, all of the above can be created with statically typed allocators.

It does demonstrate that if you use a pool allocator, the deque behaviour becomes pooled.

Aside: Pool allocators exist that are able to forgo cleanup, which can be valid in some scenarios, e.g. where the entire memory pool lives on the stack anyways. This is a common allocation optimization technique, allowing large numbers of deallocations/destructions to be skipped.

Addressing some more points of your list:

  1. Q. How can I specify an allocator of blocks for boost::container::deque (blocks, not items!)?

    A. I think the allocator is intrinsically always called for blocks, because of how deques work.

  2. Q. If there is a way, then will such specification support allocators with state?

    A. Standard library as well as Boost Container should all support stateful allocators these days. When in doubt, Boost Container has your back.

  3. Q. If yes, then will the above-mentioned allocator go for it?

    A. I didn't look closely at it, but you may put it in the same test-bed to find out

  4. Q. After all, if not in this way, how can I make a deque that will not deallocate at least one of its freed blocks and will reuse it later when a new block will be needed?

    A. See above. I'm not sure I understand the exact intent of the "at least one" clause, but I did note that Boost's deque implementation does do a "private map" allocation — presumably for some block accounting overhead — that stays around until the deque object is destroyed. This allocation doesn't happen at (default) construction, but later.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Added a implementation with non-polymorphic stateful allocators on top of Boost `object_pool` instead of PMR `memory_resource`s. See [other answer](https://stackoverflow.com/a/63344808/85371) – sehe Aug 10 '20 at 20:55
2

Couldn't leave well enough alone - I wanted to prove that it can be done with stateful statically-typed allocators just the same.

This builds on

  • Boost Container's basic_string and deque

  • Boost Container's scoped_allocator_adaptor to get allocator propagation on uses_allocator-construction

  • Boost Pool's object_pool for a scoped pool that uses segregated storage, optional block size hinting and ordered allocations with any UserAllocator

  • Finally, I used my own (surprise! I forgot about that earlier) stateful allocator to complement object_pool¹. It's the allocator in the non_boost namespace.

    Note that if you don't mind singleton pools, you can just use Boist's own pool allocators. This is obviously recommended because mine is not maintained by Boost.


Note: I sacrificed the pooling statistics for the sake of cleanliness (it's way messier to add these to the allocator than it is to decorate a polymorphic memory_resource), but I guess the profiler knows best in the end

Live On Compiler Explorer

#include <boost/container/deque.hpp>
#include <boost/container/string.hpp>
#include <boost/container/scoped_allocator.hpp>
#include <boost/pool/pool_alloc.hpp>

// debug output
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <iomanip>

namespace bc = boost::container;

namespace non_boost {
    template <typename T, typename UserAllocator = boost::default_user_allocator_new_delete>
    class fast_pool_allocator
    {
      public:
        typedef T value_type;
        typedef UserAllocator user_allocator;

        typedef value_type * pointer;
        typedef const value_type * const_pointer;
        typedef value_type & reference;
        typedef const value_type & const_reference;
        typedef boost::pool<UserAllocator> pool_type;
        typedef typename pool_type::size_type       size_type;
        typedef typename pool_type::difference_type difference_type;

        template <typename U>
        struct rebind {
            typedef fast_pool_allocator<U, UserAllocator> other;
        };

        pool_type* _ref;
      public:
        fast_pool_allocator(pool_type& ref) : _ref(&ref) { }

        fast_pool_allocator(fast_pool_allocator const&) = default;
        fast_pool_allocator& operator=(fast_pool_allocator const&) = default;

        // Not explicit, mimicking std::allocator [20.4.1]
        template <typename U>
        fast_pool_allocator(const fast_pool_allocator<U, UserAllocator> & other) : _ref(other._ref)
        { }

        // Default destructor used.
        static pointer address(reference r)                     { return &r;                                      } 
        static const_pointer address(const_reference s)         { return &s;                                      } 
        static size_type max_size()                             { return (std::numeric_limits<size_type>::max)(); } 
        void construct(const pointer ptr, const value_type & t) { new (ptr) T(t);                                 } 
        void destroy(const pointer ptr)                         { ptr->~T();                                      } 

        bool operator==(fast_pool_allocator const& rhs) const { return _ref == rhs._ref; }
        bool operator!=(fast_pool_allocator const& rhs) const { return _ref != rhs._ref; }

        pointer allocate(const size_type n)
        {
            const pointer ret = (n == 1) 
                ? static_cast<pointer>( (_ref->malloc)() ) 
                : static_cast<pointer>( _ref->ordered_malloc(n) );
            if (ret == 0)
                boost::throw_exception(std::bad_alloc());
            return ret;
        }

        pointer allocate(const size_type n, const void * const) { return allocate(n); }
        pointer allocate()
        {
            const pointer ret = static_cast<pointer>( (_ref->malloc)() );
            if (ret == 0)
                boost::throw_exception(std::bad_alloc());
            return ret;
        }
        void deallocate(const pointer ptr, const size_type n)
        {
#ifdef BOOST_NO_PROPER_STL_DEALLOCATE
            if (ptr == 0 || n == 0)
                return;
#endif
            if (n == 1)
                (_ref->free)(ptr);
            else
                (_ref->free)(ptr, n);
        }
        void deallocate(const pointer ptr) { (_ref->free)(ptr); }
    };

    //Specialization of fast_pool_allocator<void> required to make the allocator standard-conforming.
    template<typename UserAllocator>
    class fast_pool_allocator<void, UserAllocator> {
    public:
        typedef void*       pointer;
        typedef const void* const_pointer;
        typedef void        value_type;

        template <class U> struct rebind {
            typedef fast_pool_allocator<U, UserAllocator> other;
        };
    };
}

template <typename T> using Alloc
    = bc::scoped_allocator_adaptor< non_boost::fast_pool_allocator<T> >;

struct X {
    using allocator_type = Alloc<X>;

    template<typename K, typename V>
    explicit X(K&& key, V&& value, allocator_type a)
        : key(std::forward<K>(key), a), value(std::forward<V>(value), a) {}

    bc::basic_string<char, std::char_traits<char>, Alloc<char> > key, value;

    friend std::ostream& operator<<(std::ostream& os, X const& x) {
        return os << "(" << std::quoted(x.key.c_str()) << ", " << std::quoted(x.value.c_str()) << ")";
    }
};

int main() {
    boost::pool<boost::default_user_allocator_new_delete> _pool { sizeof(X) };
    Alloc<X> alloc { _pool };

    bc::deque<X, Alloc<X> > collection(alloc);
    auto dump = [&] { fmt::print("collection = {}\n", collection); };

    std::vector data1 { std::pair{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
    std::vector data2 { std::pair{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
    std::vector data3 { std::pair{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };

    auto i = 0;
    for (auto const& data : {data1, data2, data3}) {
        for (auto [k,v] : data) {
            (i%2)
                ? collection.emplace_back(k, v)
                : collection.emplace_front(k, v);
        }

        dump();

        collection.at(1).value.append(50, '*'); // thwart SSO
        dump();

        collection.at(1).value = "sept";
        dump();

        collection.clear();
        dump();
    }
}

¹ (see Is there some way to use boost::obect_pool with faster free operations and Is there a BOOST pool fixed-sized allocator?)

sehe
  • 374,641
  • 47
  • 450
  • 633
  • And here's proof that you can do it with modern standard library implementations (ie.. that implement stateful allocator support): [Live On Compiler Explorer](https://godbolt.org/z/TnE3o1) without Boost Container – sehe Aug 10 '20 at 20:56