6

I have a function that returns a std::vector<std::byte>

I am aware that std::byte is not a character type nor an integral type, and that converting it to char is only possible through a typecast. So far so good.

So I would like (in cases where I know that the vector only contains character data) to transfer ownership of the underlying buffer from the std::vector<std::byte> to a std::vector<char> using std::move, so as to avoid copying the entire underlying buffer.

When I try doing this, I get this error:

no suitable user-defined conversion from "std::vector<std::byte, std::allocatorstd::byte>" to "std::vector<char,std::allocator>" exists

Is this at all possible using C++? I think there are real use cases where one would want to do this

matt
  • 95
  • 5
  • can you please post a minimal reproducible example? – Alberto Sinigaglia Jun 21 '20 at 22:19
  • I think you can't because when you've defined them you've made them `std::byte`s, hence you can't move them as new types. moving means transferring the ownership i.e the elements themselves will be moved, so If you can change the type of some object (without copying), you can do what you want. – asmmo Jun 21 '20 at 22:31
  • @Berto99 Sure; Let's say we have `std::vector bytevec`. This vector only contains character data (i.e. data which can be represented as char). I would like to be able to do something like `std::vector charvec = std::move(bytevec)` so that the underlying buffer with the data is transferred from bytevec to charvec without actually copying the data. Of course the compiler complains because std::byte cannot be implicitly converted to char, so I was wondering if there is a way to 'cast' it while using std::move? – matt Jun 21 '20 at 22:33
  • so you would not like to use a for loop to move the bytes, instead you want to swap just the pointers? – Alberto Sinigaglia Jun 21 '20 at 22:35
  • 3
    I don't think this is possible. Casting may work in practice (or it may not) but it will be undefined technically. I think it may be worth investing in developing a `std::vector` like type that has the ability to *adopt* external memory. Or consider [std::span](https://en.cppreference.com/w/cpp/container/span). – Galik Jun 21 '20 at 22:36
  • @Berto99 Yes I'd like the char vector to take ownership of the data from the byte vector. After copying the pointers from the origin object to the target object, the pointers of the origin object would be set to something like null to indicate that it no longer 'owns' that memory (more info about std::move here: https://en.cppreference.com/w/cpp/utility/move) – matt Jun 21 '20 at 22:45
  • @Galik It would make sense to have a std::vector class that can adopt external memory. Do you know why the STL vector class does not have this feature yet? – matt Jun 21 '20 at 22:45
  • 1
    @matt I doubt the STL will ever have such features as it is a very general purpose library. If you want to do low-level (advanced/dangerous) stuff you're on your own (or find a library that supports that). – Galik Jun 21 '20 at 22:58
  • @Galik Got it. Thanks :) – matt Jun 21 '20 at 23:11
  • @Galik plz, can you tell me if my previous comment is true or false and explain it? – asmmo Jun 21 '20 at 23:14
  • @matt Would it be ok to create a class that keeps a refenrece to the `vector` and provides a `char` (or `uint8_t`) interface that deals with the casting back and forth? I think that's the route I'd take. – Ted Lyngmo Jun 21 '20 at 23:21
  • 1
    @matt: "*It would make sense to have a std::vector class that can adopt external memory.*" But `vector` doesn't store memory; it stores an array of *`T`s*. It is reasonable to allow a `vector` instance to adopt the storage from another `vector` instance. It makes far less sense for it to be able to adopt the "memory" of some unrelated type `vector`, since that is an array of `U`s, which is not an array of `T`s. – Nicol Bolas Jun 22 '20 at 00:19
  • @NicolBolas: That's true in cases where an array of `U`s is not an array of `T`s. But the strict aliasing rule makes that a bit more complicated than simply `is_same_type_v, remove_const>`, because e.g. signed and unsigned variations of the same type are guaranteed to be representation-compatible and valid for aliasing, so any time you have an `unsigned[N]` you do in fact also have an `int[N]` and vice versa. Here the question involves compatibility between `std::byte` and `char`. – Ben Voigt Jun 22 '20 at 20:27
  • @BenVoigt: Just because you can alias doesn’t mean you can do pointer arithmetic—but morally you perhaps *should* be able to (to access the underlying storage); there are proposals in flight for the latter (at least for `std::byte` or so). – Davis Herring Jun 24 '20 at 00:55
  • @DavisHerring: According to the pointer arithmetic rules, it is allowed as long as you don't leave the bounds of the parent object. Which in this case is the array. – Ben Voigt Jun 24 '20 at 14:44

3 Answers3

7

I would probably leave the data in the original vector<byte> and make a small class that keeps a reference to the original vector<byte> and does the necessary casting when you need it.

Example:

#include <cstddef>
#include <iostream>
#include <vector>

template<typename T>
struct char_view {
    explicit char_view(std::vector<T>& bytes) : bv(bytes) {}

    char_view(const char_view&) = default;
    char_view(char_view&&) = delete;
    char_view& operator=(const char_view&) = delete;
    char_view& operator=(char_view&&) = delete;

    // capacity
    size_t element_count() const { return bv.size(); }
    size_t size() const { return element_count() * sizeof(T); }

    // direct access
    auto data() const { return reinterpret_cast<const char*>(bv.data()); }
    auto data() { return reinterpret_cast<char*>(bv.data()); }

    // element access
    char operator[](size_t idx) const { return data()[idx]; }
    char& operator[](size_t idx) { return data()[idx]; }

    // iterators - with possibility to iterate over individual T elements
    using iterator = char*;
    using const_iterator = const char*;

    const_iterator cbegin(size_t elem = 0) const { return data() + elem * sizeof(T); }
    const_iterator cend(size_t elem) const { return data() + (elem + 1) * sizeof(T); }
    const_iterator cend() const { return data() + size(); }

    const_iterator begin(size_t elem = 0) const { return cbegin(elem); }
    const_iterator end(size_t elem) const { return cend(elem); }
    const_iterator end() const { return cend(); }
    
    iterator begin(size_t elem = 0) { return data() + elem * sizeof(T); }
    iterator end(size_t elem) { return data() + (elem + 1) * sizeof(T); }
    iterator end() { return data() + size(); }

private:
    std::vector<T>& bv;
};

int main() {
    using std::byte;

    std::vector<byte> byte_vector{byte{'a'}, byte{'b'}, byte{'c'}};

    char_view cv(byte_vector);

    for(char& ch : cv) {
        std::cout << ch << '\n';
    }
}

Output:

a
b
c

A simpler option if you only need const access could be to create a string_view:

template<typename T>
std::string_view to_string_view(const std::vector<T>& v) {
    return {reinterpret_cast<const char*>(v.data()), v.size() * sizeof(T)};
}
//...
auto strv = to_string_view(byte_vector);
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • Off-topic: user-defined byte literals would probably be neater than C style casts – Aykhan Hagverdili Jun 21 '20 at 23:37
  • While we're at it, let's generalize this view thing to help spread UB https://gcc.godbolt.org/z/TR6rsy – Aykhan Hagverdili Jun 21 '20 at 23:41
  • 3
    @_Static_assert "_user-defined byte literals_" I totally agree. I tried to keep it short and I couldn't find any pre-defined user-defined byte literals. "_let's generalize this view thing to help spread UB_" :-) I think that by staying at `char` we should be fine? – Ted Lyngmo Jun 21 '20 at 23:46
  • 1
    Sure, reinterpreting as char array should be fine, but where's the fun in that? :) – Aykhan Hagverdili Jun 21 '20 at 23:48
  • Is there some good reason why some move/conversion isn't supported in the language? I run into this frequently and it's a PITA. – Andrew Jan 06 '22 at 17:46
  • @Andrew what do you mean by move/conversion? – Ted Lyngmo Jan 06 '22 at 18:05
  • @TedLyngmo: std::vector v(std::vector&& other) for example, as a move. vector byte buffers are vector byte buffers, whether the template type is char, unsigned char, uint8_t, std::byte, signed char. – Andrew Jan 06 '22 at 18:44
  • @Andrew I see. When it comes to `unsigned char` and `uint8_t`, you can use them interchangeably (as long as `uint8_t` exists at all of course). When it comes to the others, they are distinct types so just letting that move through opens up for mistakes. I rarely find this to be an obstacle though. Perhaps you could ask a question about it to get some more in-depth answers? – Ted Lyngmo Jan 06 '22 at 20:25
2

std::vector does not allow attaching or detaching to memory allocations , other than moves from a vector of exactly the same type. This has been proposed but people raised (valid) objections about the allocator for attaching and so on.

The function returning vector<byte> constrains you to work with a vector<byte> as your data container unless you want to copy the data out.

Of course, you can alias the bytes as char in-place for doing character operations.

M.M
  • 138,810
  • 21
  • 208
  • 365
-6

You can achieve this with a cast, as shown below. This is legal because the cast is to a char reference (if casting to any other type it would be UB) but, with gcc at least, you still have to compile it with -fno-strict-aliasing to silence the compiler warning. Anyway, here's the cast:

std::vector <char> char_vector = reinterpret_cast <std::vector <char> &&> (byte_vector);

And here's a live demo

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48