What is the safe approach to convert incoming network `char*` data to `uint8_t` and back

Question

This question on SO deals with the char <-> uint8_t issue mainly from the perspective of the Strict Aliasing Rule. Roughly speaking, it clarifies that as long as uint8_t is implemented as either char or unsigned char, we're fine. I'm interested in understanding whether or not the possible incompatability of the signed/unsignedness of uint8_t with char matters when using reinterpret_cast.

When I need to deal directly with bytes, I prefer using uint8_t. However, the Winsock API deals with char*s.

I would like to understand how to handle these conversions correctly, in order to not run into Undefined Behavior or other phenomenons that damage the portability of the app.

The following functions takes a std::array<uint8_t, 4> and converts it to an uint32_t - i.e., takes 4 bytes and converts them to an integer.

uint32_t bytes_to_u32(const std::array<uint8_t, 4>& bytes) {
    return (bytes[0] << 24) + (bytes[1] << 16) + (bytes[2] << 8) + bytes[3];
}

However, the data incoming from the socket (using the recv function) comes in char* form.

One approach is the following:

std::array<uint8_t, 4> length_buffer;
int bytes_received = 0;
while (bytes_received < 4) {
    bytes_received += recv(sock, reinterpret_cast<char*>(length_buffer.data()) + bytes_received, 4 - bytes_received, 0);
}

It seems to work on my machine. However - is this safe? If I'm not mistaken, on a different machine or compiler, a char may be signed, meaning the length_buffer will hold wrong values after the conversion. Am I wrong?

I know that reinterpret_cast does not change the bit pattern at all - it leaves the binary data the same. Knowing this, it still doesn't fully register in my brain whether or not this technique is the right way to go.

Please explain how to approach this problem.

EDIT: Also noting, after converting the char* to uint8_t*, I need to be able to convert the uint8_t* to a valid numeric value, or sometimes test the numeric values of individual bytes in the buffer. In order to interpret the "commands" I was sent over the network, and send some back to the other side.

Have you experienced any errors? The `char` type is always one byte in size (so a conversion between `uint8_t` and `char` should always be safe. If you need to test at compile time wether it is convertable (I don't know for sure wether I'm right), you can always try to static_cast from one to another to test. — J. Lengel, Jun 21 '20 at 13:53
@J.Lengel "The char type is always one byte in size" - no. It *usually* is, but strictly speaking it is [CHAR_BIT](https://en.cppreference.com/w/cpp/types/climits) bits in size. You are not guaranteed 8 bits on all implementations. If you want a 8 bit byte, there's [std::byte](https://en.cppreference.com/w/cpp/types/byte) for that. — Jesper Juhl, Jun 21 '20 at 13:59
@JesperJuhl, `char` is always one byte, `static_assert(sizeof(char) == 1)`. But the number of bits in one byte [might not be 8](https://en.cppreference.com/w/cpp/language/sizeof#Notes): *Depending on the computer architecture, a byte may consist of 8 or more bits, the exact number being recorded in `CHAR_BIT`.* — Evg, Jun 21 '20 at 14:04
@JesperJuhl no, char is always exactly one byte. std::byte is always the same size as char. The size of the byte is indeed not necessarily 8 bits. Difference between char and std::byte is that latter is not a a character type nor an integer type while the former is. — eerorika, Jun 21 '20 at 14:06
@Evg and eerorika Right you are. I got that a little mixed up. Thank you for the correction. — Jesper Juhl, Jun 21 '20 at 14:06
I don't fully understand your concerns, but I don't see any problems with this approach. — HolyBlackCat, Jun 21 '20 at 14:09

score 0 · Answer 1 · answered Jun 21 '20 at 17:48

I hope I did understand your question correctly, you can solve this problem using unions:

//Union is template so you can use this for any given type
template<typename T>
union ConvertBytes
{
    T value;
    char byte[sizeof(T)];
};

void process()
{
    recv(socket, buffer, bufferLength, 0); //Recieve data

    ConvertBytes<uint32_t> converter;
    for (int i = 0; i < sizeof(uint32_t); i++) //Considering that you recieve only that one uint32
    {
        converter.byte[i] = buffer[i]; //Assign all bytes into union
    }
    uint32_t result = converter.value; //Get uint32_t value from union
}

This kind of type punning is [allowed in C, but not in C++](https://stackoverflow.com/questions/25664848/unions-and-type-punning). — HolyBlackCat, Jun 22 '20 at 07:58

What is the safe approach to convert incoming network `char*` data to `uint8_t` and back

1 Answers1