2

Is it possible to manipulate an std::vector<unsigned char> through its data pointer as if it were a container of float?

Here is an example that compiles and (seemingly?) runs as desired (GCC 4.8, C++11):

#include <iostream>
#include <vector>

int main()
{
    std::vector<unsigned char> bytes(2 * sizeof(float));
    auto ptr = reinterpret_cast<float *>(bytes.data());
    ptr[0] = 1.1;
    ptr[1] = 1.2;
    std::cout << ptr[0] << ", " << ptr[1] << std::endl;
    return 0;
}

This snippet successfully writes/reads data from the byte buffer as if it were an array of float. From reading about reinterpret_cast I'm afraid that this might be undefined behavior. My confidence in understanding the type aliasing details is too little for me to be sure.

Is the code snippet undefined behavior as outlined above? If so, is there another way to achieve this sort of byte manipulation?

Stargazer
  • 63
  • 6
  • I'd say it breaks strict aliasing. The best solution is to create a new vector (of `float`) of the correct size, and do a byte-wise copy from the byte vector to the float vector (i.e. `memcpy`). – Some programmer dude May 20 '20 at 11:23
  • @Gupta this is a simplified example of my use case, I'm not sure if me taking the time to explain is worth it. Long story short this has to do with the fact that what is 'float' in this example is in fact a type that is only known during runtime. – Stargazer May 20 '20 at 11:27
  • @Groo I am planning to use `bytes` afterwards. More accurately, a library is using its data pointer to read the floats. – Stargazer May 20 '20 at 11:29
  • @Groo No it isn't. – Asteroids With Wings May 20 '20 at 11:32
  • @Gupta No it isn't. – Asteroids With Wings May 20 '20 at 11:32
  • @Gupta If you could post an answer with a link to the relevant wording of the standard or at [cppreference.com](https://cppreference.com) that would be great – Stargazer May 20 '20 at 11:33
  • @Groo I am casting and passing the pointer on to a library routine accepting `void *` – Stargazer May 20 '20 at 11:38
  • @Stargazer Seems like a `reinterpret_cast(vec.data())` on a `std::vector` would be more appropriate then – Asteroids With Wings May 20 '20 at 11:40
  • @AsteroidsWithWings see my first comment towards Gupta for why things aren't that easy. – Stargazer May 20 '20 at 11:42
  • @Stargazer Then you're stuck with antics :) (Please fully explain the use case in the question.) – Asteroids With Wings May 20 '20 at 11:42
  • @Stargazer: you are allowed to do vice versa (cast the `float*` to `char*` for byte inspection), although the writers of the library will likely rely on UB in that case. So a defined way would be to create an array of floats and then `reinterpret_cast` it before passing to the library.\ – vgru May 20 '20 at 11:47

2 Answers2

5

Legal answer

No, this is not permitted.

C++ isn't just "a load of bytes" — the compiler (and, more abstractly, the language) have been told that you have a container of unsigned chars, not a container of floats. No floats exist, and you can't pretend that they do.

The rule you're looking for, which is known as strict aliasing, may be found under [basic.lval]/8.

The opposite would work, because it is permitted (via a special rule in that same paragraph) to examine the bytes of any type via an unsigned char*. But in your case, the quickest safe and correct way to "get" a float from something that starts life as unsigned char is to std::memcpy or std::copy those bytes into an actual float that exists:

std::vector<unsigned char> bytes(2 * sizeof(float));
float f1, f2;

// Extracting values
std::memcpy(
   reinterpret_cast<unsigned char*>(&f1),
   bytes.data(),
   sizeof(float)
);

std::memcpy(
   reinterpret_cast<unsigned char*>(&f2),
   bytes.data() + sizeof(float),
   sizeof(float)
);

// Putting them back
f1 = 1.1;
f2 = 1.2;

std::memcpy(
   bytes.data(),
   reinterpret_cast<unsigned char*>(&f1),
   sizeof(float)
);

std::memcpy(
   bytes.data() + sizeof(float),
   reinterpret_cast<unsigned char*>(&f2),
   sizeof(float)
);

This is fine as long as those bytes form a valid representation of float on your system. Granted it looks a little unwieldy, but a quick wrapper function will make short work of it.

A common alternative, assuming you only care about floats and don't need a resizable buffer, is to produce some std::aligned_storage then do a bunch of placement new into the resulting buffer. Since C++17, you could alternatively play around with std::launder, though resizing the vector (read: reallocating its buffer) would also be inadvisable in that scenario.

Also, these approaches are quite involved and result in complex code that not all your readers will be able to follow. If you can launder your data such that it "is" a sequence of floats, you may as well just make yourself a nice std::vector<float> in the first place. Per the above, it is permitted to get and use an unsigned char* to that buffer if you wish.

It ought to be noted that there is much code out there in the wild that uses your original approach (particularly in older projects with a barebones C heritage). On many implementations, it may appear to work. But it is a common misconception that it is valid and/or safe, and you're prone to instruction "re-ordering" (or other optimisations) if you rely on it.


Hedge-betting answer

For what it's worth, if you disable strict aliasing (GCC permits this as an extension, and LLVM doesn't even implement it), then you can probably get away with your original code. Just be careful.

Asteroids With Wings
  • 17,071
  • 2
  • 21
  • 35
1

Is it possible to manipulate an std::vector through its data pointer as if it were a container of float?

Not quite. Your example has UB indeed.

However, you can reuse the storage of those bytes to create the floats there. Example:

float* ptr = std::launder(reinterpret_cast<float*>(bytes.data()));
std::uninitialized_fill_n(ptr, 2, 0.0f);

After this, the lifetime of the unsigned char objects has ended, end there are floats there instead. Using ptr is well defined.

Whether this would be useful for you is another matter. Start with a simpler design first: Why not simply use std::vector<float>?

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • How well does this approach interact with `vector`'s destruction (and/or reallocations), if applicable? – Asteroids With Wings May 20 '20 at 11:43
  • @AsteroidsWithWings Reallocations: I don't know, maybe fine, maybe technically not; I would recommend to not do reallocations. Destruction: Fine because both unsigned char and float are trivial. – eerorika May 20 '20 at 11:46
  • Yep, indeed, agreed. – Asteroids With Wings May 20 '20 at 11:47
  • @eerorika The snippet in the question is a simplified version of a tougher case I came across in practice. I can solve the problem using other approaches but they are ugly and verbose. Were byte manipulation as I desired legal then it would be the most elegant solution. – Stargazer May 20 '20 at 11:49
  • @Stargazer For what it's worth, if you disable strict aliasing (GCC permits this as an extension), then you can probably get away with it. But be careful. (Just realised this and added it to my answer :D) – Asteroids With Wings May 20 '20 at 11:51