Why can't I wrap a T* in an std::vector?

Question

I have a T* addressing a buffer with len elements of type T. I need this data in the form of an std::vector<T>, for certain reasons. As far as I can tell, I cannot construct a vector which uses my buffer as its internal storage. Why is that?

Notes:

Please don't suggest I use iterators - I know that's usually the way around such issues.
I don't mind that the vector having to copy data around if it's resized later.
This question especially baffles me now that C++ has move semantics. If we can pull an object's storage from under its feet, why not be able to shove in our own?

Why don't you just move the elements in sequence into the vector? Presumably you are putting them into a vector so that the collection can be resized later, so you anticipate that they will be moved, right? If that will happen anyways, possibly several times, then avoiding it happening once could be a premature optimization. — Chris Beck, Feb 20 '16 at 23:36
@ChrisBeck: No, that's not why I'm putting them in the vector. Suppose I have some STLish code which takes a vector, not a raw array. — einpoklum, Feb 20 '16 at 23:51
I guess you can't use `std::array` either because of dynamic sizing or something presumably? Maybe there is some kind of wrapper type over a raw buffer like you have, which has an STL-like interface close enough to vector for your needs? My guess is that making something like that from scratch is probably easier then messing around with allocators — Chris Beck, Feb 21 '16 at 01:14
Previous comment only makes sense if you have the freedom to change the other code to take say a template parameter instead of `std::vector`. But, that may often be the case? If it is fairly generic anyways then it may be desirable. — Chris Beck, Feb 21 '16 at 01:53
Also re: "If we can pull an object's storage from under its feet, why not be able to shove in our own?", it reminds me of when people ask, "why does `std::string` make a dynamically allocated copy when I construct it from a string literal? That's inefficient." It's because, in C++ you don't pay for what you don't use. Default `std::string` type objects are assumed to be allocated a certain way so that it does not have to remember, and the dtor does not have to test, whether it should delete or not. This is sort of the same issue imo. — Chris Beck, Feb 21 '16 at 03:29
(Of course there are several other reasons that special `std::string` behavior for string literals wouldn't really work.) — Chris Beck, Feb 21 '16 at 03:38

score 14 · Accepted Answer · edited May 23 '17 at 11:46

You can.

You write about std::vector<T>, but std::vector takes two template arguments, not just one. The second template argument specifies the allocator type to use, and vector's constructors have overloads that allow passing in a custom instance of that allocator type.

So all you need to do is write an allocator that uses your own internal buffer where possible, and falls back to asking the default allocator when your own internal buffer is full.

The default allocator cannot possibly hope to handle it, since it would have no clue on which bits of memory can be freed and which cannot.

A sample stateful allocator with an internal buffer containing already-constructed elements that should not be overwritten by the vector, including a demonstration of a big gotcha:

struct my_allocator_state {
    void *buf;
    std::size_t len;
    bool bufused;
    const std::type_info *type;
};

template <typename T>
struct my_allocator {
    typedef T value_type;

    my_allocator(T *buf, std::size_t len)
        : def(), state(std::make_shared<my_allocator_state, my_allocator_state>({ buf, len, false, &typeid(T) })) { }

    template <std::size_t N>
    my_allocator(T(&buf)[N])
        : def(), state(std::make_shared<my_allocator_state, my_allocator_state>({ buf, N, false, &typeid(T) })) { }

    template <typename U>
    friend struct my_allocator;

    template <typename U>
    my_allocator(my_allocator<U> other)
        : def(), state(other.state) { }

    T *allocate(std::size_t n)
    {
        if (!state->bufused && n == state->len && typeid(T) == *state->type)
        {
            state->bufused = true;
            return static_cast<T *>(state->buf);
        }
        else
            return def.allocate(n);
    }

    void deallocate(T *p, std::size_t n)
    {
        if (p == state->buf)
            state->bufused = false;
        else
            def.deallocate(p, n);
    }

    template <typename...Args>
    void construct(T *c, Args... args)
    {
        if (!in_buffer(c))
            def.construct(c, std::forward<Args>(args)...);
    }

    void destroy(T *c)
    {
        if (!in_buffer(c))
            def.destroy(c);
    }

    friend bool operator==(const my_allocator &a, const my_allocator &b) {
        return a.state == b.state;
    }

    friend bool operator!=(const my_allocator &a, const my_allocator &b) {
        return a.state != b.state;
    }

private:
    std::allocator<T> def;
    std::shared_ptr<my_allocator_state> state;

    bool in_buffer(T *p) {
        return *state->type == typeid(T)
            && points_into_buffer(p, static_cast<T *>(state->buf), state->len);
    }
};

int main()
{
    int buf [] = { 1, 2, 3, 4 };
    std::vector<int, my_allocator<int>> v(sizeof buf / sizeof *buf, {}, buf);
    v.resize(3);
    v.push_back(5);
    v.push_back(6);
    for (auto &i : v) std::cout << i << std::endl;
}

Output:

The push_back of 5 fits into the old buffer, so construction is bypassed. When 6 is added, new memory is allocated, and everything starts acting as normal. You could avoid that problem by adding a method to your allocator to indicate that from that point onward, construction should not be bypassed any longer.

points_into_buffer turned out to be the hardest part to write, and I've omitted that from my answer. The intended semantics should be obvious from how I'm using it. Please see my question here for a portable implementation in my answer there, or if your implementation allows it, use one of the simpler versions in that other question.

By the way, I'm not really happy with how some implementations use rebind in such ways that there is no avoiding storing run-time type info along with the state, but if your implementation doesn't need that, you could make it a bit simpler by making the state a template class (or a nested class) too.

Is such an allocator available as part of the standard library, or boost maybe? — einpoklum, Jan 01 '15 at 11:57
@einpoklum Boost does have some custom allocators, but I'm not aware of any that has the specific behaviour you're looking for. — , Jan 01 '15 at 12:01
Will this work? We want to mix assignment/initialization from with reusing the buffer from (and not do pointless self copies). — Yakk - Adam Nevraumont, Jan 01 '15 at 13:48
@Yakk That's a good point: a custom allocator is assumed to return uninitialised memory, so this doesn't handle the vector's attempt to re-initialise the buffer. A well-written initialisation from that buffer where the compiler can tell that the newly-constructed object should have exactly the same value as what was already stored there should be optimised away, but the way to write that will be rather implementation-specific, falling back to the pointless copies you want to avoid on other implementations. I don't have a good answer for that, at least not off of the top of my head. — , Jan 01 '15 at 14:11
You might be able to tinker with the allocator's `construct()` and `destroy()` to achieve that, but it would be really messy. — T.C., Jan 01 '15 at 18:22
@hvd: So you're actually telling me that this _can't_ be done with just an allocator? I want simple wrapping, not convoluted wriggling to trick the compiler into doing things. — einpoklum, Jan 02 '15 at 17:54
@einpoklum It wasn't clear to me until Yakk's comment that you didn't just want to use the specified buffer, but you also wanted to use its existing contents. I didn't see how to do that at first, but T.C.'s comment covers it: you can make your allocator initially not perform any action during construction. This wouldn't rely on anything being optimised away. It would be a bit messy, but if you have a stateful allocator, not *really* messy: you could create your allocator, set it up to do nothing useful, create your vector, resize it, and finally change your allocator to do useful work. — , Jan 02 '15 at 18:31
@hvd: Can you add this to your actual answer? (I hope I have time to try it next week so I can accept...) — einpoklum, Jan 02 '15 at 19:53
@einpoklum Yeah, I agree, that probably would make it better. I'll try to add a simple self-contained sample when I can. — , Jan 02 '15 at 19:57

score 7 · Answer 2 · answered Jan 01 '15 at 11:31

7

The short answer is that a vector can't use your buffer because it wasn't designed that way.

It makes sense, too. If a vector doesn't allocate its own memory, how does it resize the buffer when more items are added? It allocates a new buffer, but what does it do with the old one? Same applies to moving - if the vector doesn't control its own buffer, how can it give control of this buffer to another instance?

answered Jan 01 '15 at 11:31

zmbq

38,013
14
101
171

zmbq: Nope, doesn't make sense. First of all, suppose it's a const vector; it never resizes. Second, even if its mutable, resizing makes the buffer 'fair game' for reallocation and you don't get promised anything. A similar argument could be made about moving. – einpoklum Jan 02 '15 at 17:56
2

So how can the vector reallocate your original `T*` if it doesn't know how you got it? Answer - it can't. And hacking an allocator to track this still doesn't give you any way to inform the vector that the buffer is already populated. – Useless Jan 03 '15 at 15:19
unique_ptr would allow you to express the fact that the memory is now uniquely owned by the vector and may be deallocated. – Julian Morrison May 12 '20 at 16:33

einpoklum · Answer 3 · 2020-05-12T16:34:44.113

These days - you no longer need to wrap a T* in an std::vector, you can wrap it with an std::span (in C++20; before that - use gsl::span). A span offers you all the convenience of a standard library container - in fact, basically all relevant features of std::vector excluding changes to the size - with a very thin wrapper class. That's what you want to use, really.

For more on spans, read: What is a "span" and when should I use one?

Why can't I wrap a T* in an std::vector?

3 Answers3

Linked

Related