How to handle heap allocated fixed length byte buffers?

Question

In C we use char* to point to a block of memory allocated with malloc and keep track of the size in a separate size/length variable.

What is the C++ equivalent? From what I've seen so far most people use std::vector. Typically .resize() is called to allocate the required memory and then data can be memcpyed into .data().

Personally I think std::vector shouldn't be used as a fixed size byte buffer.

Imagine we have a library which returns a std::vector<byte> representing a fixed size byte buffer. We may encounter these problems:

Needlessly dynamic: When we pass our fixed size byte buffer around to various functions they're free to push_back or insert new data to the buffer. The intention is a fixed size buffer. The ability to dynamically size it is unnecessary, unwanted and confusing.
Unwieldy: If we want to avoid copying the buffer all the time we have to use combinations of std::move, passing by reference and passing .data() and .size(). And if you forget a std::move somewhere the whole thing will be accidentally copied. This is needlessly over-complicated and just annoying in my opinion. In many cases I think a raw array allocated with (malloc/free or new[]/delete[] and wrapped in a std::span?) would be easier to manage and less confusing, although surely there's a better C++ solution?
Default initialization: People are using std::vector to store raw buffers when it default initializes all elements!?
Incompatible with C libraries: There's no way to release the underlying buffer from the grasps of the vectors destructor. So if you pass .data() into, for example, a zip_add_file(char* name, char* buffer, size_t len) function, the vector must remain alive until zip_write_files() is called which frees all the given buffers. Even if it does, when it does die it will try to free the already freed memory. And there's no way to specify allocators or deallocators so the zip library will cause undefined behavior when it frees the buffer anyway.

So is there a better way?

What is your question? What sort of answer do you expect? Seems more like a rant against `vector`. — Retired Ninja, Mar 10 '23 at 02:24
@RetiredNinja I want to know am I wrong about std::vector or is there another way I hadn't thought of etc — Dylan Bradshaw, Mar 10 '23 at 02:29
OK I moved my potential solution but not really out into an answer — Dylan Bradshaw, Mar 10 '23 at 02:31
@pm100 yeah I prefer it to a vector because you can append other buffers super easily which is sometimes useful. — Dylan Bradshaw, Mar 10 '23 at 23:56

score 0 · Answer 1 · answered Mar 10 '23 at 02:30

unique_ptr<byte[]> is almost exactly what we need. We can easily and intuitively make one with make_unique_for_overwrite<byte[]>(N). It doesn't default initialize elements. If we return this from our library function instead of a std::vector<byte> it's much easier to pass around and manage with much greater flexibility (we can easily convert to a shared_ptr for example), it's a fixed size array that doesn't come with functions to resize, push_back or insert which makes perfect sense. Bytes can be accessed by index with the [] operator. If we want compatibility with the C library mentioned earlier we can make a unique_ptr<byte[]> by passing in a buffer allocated with malloc and the free deleter. Then simply call .release() IF we want a C library to take over. Painless.

Cons:

Unfortunately it is just a pointer to a basic byte[] array so there's no .size() available. unique_ptr does treat arrays and non-arrays differently by having different default deleters and only defining the [] operator for arrays so I feel like it would have been possible to add .size()?

There's nothing stopping you from wrapping it in a class and implementing `size()` yourself. That also allows you to define copy/move semantics and the like. — PhantomPilot, Mar 10 '23 at 02:36
someone already made and uses exactly what I want: https://stackoverflow.com/a/72647537/21354715 — Dylan Bradshaw, Mar 10 '23 at 12:56

Dylan Bradshaw · Answer 2 · 2023-03-10T23:31:23.243

Runtime sized array:

template<typename T>
class dynarray {
private:
    static std::shared_mutex m;
    static std::unordered_map<const void*, size_t> sizes;
    dynarray(size_t n) {
        std::unique_lock unique(m);
        sizes.insert_or_assign(this, n);
    }
public:
    static dynarray* creator(size_t n) {
        return new (malloc(n * sizeof(T))) dynarray(n);
    }
    static void deleter(void* ptr) {
        std::unique_lock unique(m);
        sizes.erase(ptr);
        free(ptr);
    }
    size_t size() const {
        std::shared_lock shared(m);
        return sizes.at(this);
    }
    T& operator[](int index) {
        return data[index];
    }
    T data[1];
};

template<typename T> std::unordered_map<const void*, size_t> dynarray<T>::sizes;
template<typename T> std::shared_mutex dynarray<T>::m;

Unique array:

template<typename T>
auto make_unique_array(size_t n) {
    return std::unique_ptr<dynarray<T>, void(*)(void*)>(dynarray<T>::creator(n), dynarray<T>::deleter);
}

Example:

std::string abc = "abc";
auto buffer = make_unique_array<char>(abc.size() + 1);
memcpy(buffer.get(), abc.c_str(), abc.size() + 1);
    
(*buffer)[0]; // 'a'
(*buffer)[1]; // 'b'
(*buffer)[2]; // 'c'
buffer->size(); // 3
puts((char*) buffer.get()); // prints abc

Warning: I haven't tested it properly.

How to handle heap allocated fixed length byte buffers?

2 Answers2