c++ combining 2 uint8_t into one uint16_t not working?

Question

So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them. The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one. However, this is in my code not the case. This is my code:

uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);

It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256. Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.

Can anyone explain that to me?

Thanks!

You tagged this question with "endianness". What exactly do you not understand then? Because endianness is basically the answer. — Rakete1111, Jul 25 '19 at 11:32
"Little endian" means `0xABCD` is arranged as `0xCD, 0xAB`. Are you confusing the two kinds of endianness? — L. F., Jul 25 '19 at 11:33
Yeah I was confusing them. Thought little endian ment it ended with the little bit. — Its-a-me-mario, Jul 25 '19 at 11:46
Yeah I hate the names. I _always_ get them mixed up in my brain for exactly that reason. Why the "end" means the "first end" is beyond me. Try to remember that big endian is how you write numbers in the Latin system (e.g. 12345) (i.e. for many of us, "normal") and then just work from there — Lightness Races in Orbit, Jul 25 '19 at 11:51

eerorika · Accepted Answer · 2019-07-25T13:54:49.610

4

There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.

I expect that value to be 1 since x86 is little endian. However it returns 256.

Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.

Does endianess also affect the order of the bit's in the byte or not?

There is no way to access a bit by "address"¹, so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.

¹ You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields i.e. whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.

Here is a correct way to set two octets as uint16_t. The result will depend on endianness of the system:

// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;

Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.

In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.

edited Jul 25 '19 at 13:54

answered Jul 25 '19 at 11:47

eerorika

232,697
12
197
326

Indeed, a `std::copy` (or equivalent) is required for well-defined behaviour here. – Lightness Races in Orbit Jul 25 '19 at 11:56
Well, or starting off with a `uint16_t` then writing through a `uint8_t*` into it. Then just print the bastard – Lightness Races in Orbit Jul 25 '19 at 11:57
int's are just a set of bits. So I don't think C++ can tell the difference, right? – Its-a-me-mario Jul 25 '19 at 11:58
@Its-a-me-mario That's not how it works. I suggest you read up on **strict aliasing**: https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule – Max Langhof Jul 25 '19 at 12:04
@LightnessRacesinOrbit Why is what I did undefined behaviour? I don't really get it. – Its-a-me-mario Jul 25 '19 at 12:05
@Its-a-me-mario Because it violates strict aliasing. The first sentence of this answer is the precise reason that follows from the C++ standard, thus your program has undefined behavior. You seem to think that you can only have UB if "something bad" happens due to your violation of the standard, but that's not what UB is. In fact, this may work fine on your compiler but there is no such guarantee from the C++ standard and it may break at any time, because UB means "you violate an assumption that compiler writers are guaranteed by the standard". – Max Langhof Jul 25 '19 at 12:05
@MaxLanghof I read the link, but don't really get it. does strict aliasing only apply when casting to a different type of different size, or for anything that isn't a class that is derrived from something? – Its-a-me-mario Jul 25 '19 at 12:09
@Its-a-me-mario It applies every time you lie to the compiler about what object you expect to find somewhere compared to what object there really is. If you created two `uint16_t`s adjacent in memory and then access that location through a `uint32_t*`, that violates strict aliasing because _no `uint32_t` object exists at that memory location_. From the perspective of the standard, each integer (and everything else) you write to memory is an object, not just some random mass of bits/bytes, and it grants compilers the right to assume that you don't overwrite half the object with something. – Max Langhof Jul 25 '19 at 12:14
@Its-a-me-mario it applies to accessing through any pointer (or reference) reinterpret casted to a different type regardless of size, and regardless of whether it is a class or not. There are exceptions such as `T` can be reinterpreted as `const T` and anything can be reinterpreted through `char` pointer, and standard layout class objects can be reinterpreted as their first member object – eerorika Jul 25 '19 at 12:15
1

_"int's are just a set of bits. So I don't think C++ can tell the difference, right?"_ This statement is false, and that is why you have UB, and why it matters. Objects aren't just a load of bits. You're not hand-programming a physical computer. C++ is an _abstraction_ and compilers/optimisers are incredibly, incredibly, incredibly complicated. They will take every possible opportunity to make shortcuts, and it will do so in a cutthroat manner that breaks when you violate the contract. Here you pretend a `uint16_t` exists when, academically, it doesn't. And that matters to the compiler. – Lightness Races in Orbit Jul 25 '19 at 12:19
2

It's a common misconception though so don't feel bad. Many people writing C++ still take a sort of 1970s C approach to programming (it's just some bytes in memory, right?), but the reality is completely different. Just remember that _you are describing the behaviour of a program_ (not actually programming a computer's memory chips) and then you'll be fine. – Lightness Races in Orbit Jul 25 '19 at 12:20

score 3 · Answer 2 · edited Jun 20 '20 at 09:12

In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:

uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);

Produces the out = 1 result you want.

However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:

uint16_t out = lowByte + (highByte << 8);

That will work on any machine, regardless of endianness.

Edit: Bit shifting explanation added.

x << y means to shift the bits in x y places to the left (>> moves them to the right instead).

If X contains the bit-pattern xxxxxxxx, and Y contains the bit-pattern yyyyyyyy, then (X << 8) produces the pattern: xxxxxxxx00000000, and Y + (X << 8) produces: xxxxxxxxyyyyyyyy.

(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy, etc.)

A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256. That means that you can also do: Y + (X*256) + (Z*65536), but I think the shifts are clearer and show the intent better.

Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.

You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation. Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.

@Frodyne Could you maybe explain how it works? I don't know bitwise operators yet. — Its-a-me-mario, Jul 25 '19 at 12:11
@Its-a-me-mario Here `operator<<()` is the bitwise shift to the left. Let's suppose you have two bytes, `MSB == 0x05` and `LSB == 0x08`. Then you want to concatenate them into an `uint16_t`. Shifting `MSB` by the size of a byte (i.e. 8) will give you `0x0500` (`static_cast` the result to an `uint16_t` to avoid implicit conversion warnings). Then you add the result to the `LSB` and you obtains `0x0508` which the concatenation you desired. — Fareanor, Jul 25 '19 at 12:34
@Its-a-me-mario I just added a brief explanation in an edit. But basically what Fareanor said. — Frodyne, Jul 25 '19 at 12:49

score 2 · Answer 3 · answered Jul 25 '19 at 11:39

2

If p is a pointer to some multi-byte value, then:

"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.

Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:

bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15

answered Jul 25 '19 at 11:39

Willis Blackburn

8,068
19
36

1

Thanks! I was confusing the endianesses. I thought little endian meant it ended with the little bit.(Which would explain the name) – Its-a-me-mario Jul 25 '19 at 11:46
Does endianess also affect the order of the bit's in the byte or not? – Its-a-me-mario Jul 25 '19 at 11:48
@Its-a-me-mario No. – Lightness Races in Orbit Jul 25 '19 at 11:52

score 1 · Answer 4 · answered Jul 25 '19 at 11:51

Your code works but your misinterpreted how to read "bytes"

#include <cstdint>
#include <cstddef>
#include <iostream>

int main()
{
    uint8_t *in = new uint8_t[2];
    in[0] = 3;
    in[1] = 1;
    uint16_t out = *((uint16_t*)in);

    std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;

    return 0;
}

By the way, you should take care of alignment when casting this way.

score 0 · Answer 5 · answered Jul 25 '19 at 12:20

One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.

For ex.

(u)int32:  MSB:Bit 31 ...  LSB: Bit 0
(u)int16:  MSB:Bit 15 ...  LSB: Bit 0
(u)int8 :  MSB:Bit  7 ...  LSB: Bit 0

with your cast to a 16Bit value the Bytes will arrange like this

16Bit                <=  8Bit       8Bit
MSB     ...    LSB       BYTE[1]    BYTE[0]
Bit15          Bit0      Bit7 .. 0  Bit7 .. 0
0000 0001 0000 0000      0000 0001  0000 0000

which is 256 -> correct value.

c++ combining 2 uint8_t into one uint16_t not working?

5 Answers5

Edit: Bit shifting explanation added.