c bit manipulation (endianess)

Question

Could someone explain me this code please ? I have received some byte code from an assembler and now I have to use it in my virtual machine. This code is used but I don't know how it works and what it is used for.

static int32_t bytecode_to_int32 (const uint8_t* bytes)
{
  uint32_t result = (uint32_t)bytes[0] << 24 |
                    (uint32_t)bytes[1] << 16 |
                    (uint32_t)bytes[2] <<  8 |
                    (uint32_t)bytes[3] <<  0 ;
  return (int32_t)result;
}

it copies 4 bytes into an uint32_t variable. `byte[0]` int the most significant bits, `byte[1]` into next significant, ... `byte[3]` into least significant. Maybe it's for converting a big endian unsigned integer into the edianness of the local machine — Ingo Leonhardt, Feb 06 '20 at 14:55
[I wrote it](https://stackoverflow.com/a/60092721/584518), and as mentioned then, it swaps byte order from big endian to little endian. Now would be a good time to study endianess, bit shifts, bitwise OR... — Lundin, Feb 06 '20 at 14:56
It converts four bytes in big-endian order into a native 32-bit value. The bytes need not be properly aligned. The native value might be big-endian or little-endian. — Jonathan Leffler, Feb 06 '20 at 14:56
And yeah @JonathanLeffler is correct, strictly speaking it converts from big endian to the endianess of the CPU, whatever that happens to be - portably. This is why bit shifts are superior to any other version - in addition to being very fast, they give CPU independent code. — Lundin, Feb 06 '20 at 14:59
In general, when you see this kind of code and don't understand it, you should at least try providing some sample input, examine the output, and maybe step through in a debugger. You should be able to form some kind of intuition before you have to give up. In this case it may also help to know about [endianness](https://en.wikipedia.org/wiki/Endianness). — Useless, Feb 06 '20 at 15:09

score 2 · Answer 1 · edited Feb 06 '20 at 15:02

2

It builds up a 32 bit word from 4 bytes. For example if the bytes are : 1st: 0x12 , 2nd: 0x34, 3rd: 0x56, 4th: 0x78 Then:

static int32_t bytecode_to_int32 (const uint8_t* bytes)
{
  uint32_t result = (uint32_t)bytes[0] << 24 | // -> 0x12000000
                    (uint32_t)bytes[1] << 16 | // -> 0x00340000
                    (uint32_t)bytes[2] <<  8 | // -> 0x00005600
                    (uint32_t)bytes[3] <<  0 ; // -> 0x00000078
  return (int32_t)result; // bitwise oring this result -> 0x12345678
}

edited Feb 06 '20 at 15:02

Jonathan Leffler

730,956
141
904
1,278

answered Feb 06 '20 at 14:58

Eraklon

4,206
2
13
29

2

Also notably, the casts to `uint32_t` are absolutely mandatory or the code may end up shifting values into the sign bit of the `bytes[0]` operand, which is implicitly promoted by the << operator to signed `int`. – Lundin Feb 06 '20 at 15:02

score 0 · Answer 2 · answered Feb 06 '20 at 14:56

This function attempts to combine the four bytes in a uint8_t[4] into a single uint32_t with big-endian byte order, cast the result into a signed int32_t, and return that.

So, if you pass a pointer to the array { 0xAA, 0xBB, 0xCC, 0xDD } to the function, it will combine them into a 32-bit integer with the most significant bytes of the integer coming from the lowest addresses in the array, giving you 0xAABBCCDD or -1430532899.

However, if the array pointed to by the argument bytes is not at least four bytes long, it has undefined behavior.

c bit manipulation (endianess)

2 Answers2