2

Could someone explain me this code please ? I have received some byte code from an assembler and now I have to use it in my virtual machine. This code is used but I don't know how it works and what it is used for.

static int32_t bytecode_to_int32 (const uint8_t* bytes)
{
  uint32_t result = (uint32_t)bytes[0] << 24 |
                    (uint32_t)bytes[1] << 16 |
                    (uint32_t)bytes[2] <<  8 |
                    (uint32_t)bytes[3] <<  0 ;
  return (int32_t)result;
}
jps
  • 20,041
  • 15
  • 75
  • 79
  • it copies 4 bytes into an uint32_t variable. `byte[0]` int the most significant bits, `byte[1]` into next significant, ... `byte[3]` into least significant. Maybe it's for converting a big endian unsigned integer into the edianness of the local machine – Ingo Leonhardt Feb 06 '20 at 14:55
  • [I wrote it](https://stackoverflow.com/a/60092721/584518), and as mentioned then, it swaps byte order from big endian to little endian. Now would be a good time to study endianess, bit shifts, bitwise OR... – Lundin Feb 06 '20 at 14:56
  • 1
    It converts four bytes in big-endian order into a native 32-bit value. The bytes need not be properly aligned. The native value might be big-endian or little-endian. – Jonathan Leffler Feb 06 '20 at 14:56
  • 1
    And yeah @JonathanLeffler is correct, strictly speaking it converts from big endian to the endianess of the CPU, whatever that happens to be - portably. This is why bit shifts are superior to any other version - in addition to being very fast, they give CPU independent code. – Lundin Feb 06 '20 at 14:59
  • In general, when you see this kind of code and don't understand it, you should at least try providing some sample input, examine the output, and maybe step through in a debugger. You should be able to form some kind of intuition before you have to give up. In this case it may also help to know about [endianness](https://en.wikipedia.org/wiki/Endianness). – Useless Feb 06 '20 at 15:09
  • Endianness has nothing to do with it. – n. m. could be an AI Mar 25 '23 at 09:58

2 Answers2

2

It builds up a 32 bit word from 4 bytes. For example if the bytes are : 1st: 0x12 , 2nd: 0x34, 3rd: 0x56, 4th: 0x78 Then:

static int32_t bytecode_to_int32 (const uint8_t* bytes)
{
  uint32_t result = (uint32_t)bytes[0] << 24 | // -> 0x12000000
                    (uint32_t)bytes[1] << 16 | // -> 0x00340000
                    (uint32_t)bytes[2] <<  8 | // -> 0x00005600
                    (uint32_t)bytes[3] <<  0 ; // -> 0x00000078
  return (int32_t)result; // bitwise oring this result -> 0x12345678
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Eraklon
  • 4,206
  • 2
  • 13
  • 29
  • 2
    Also notably, the casts to `uint32_t` are absolutely mandatory or the code may end up shifting values into the sign bit of the `bytes[0]` operand, which is implicitly promoted by the << operator to signed `int`. – Lundin Feb 06 '20 at 15:02
0

This function attempts to combine the four bytes in a uint8_t[4] into a single uint32_t with big-endian byte order, cast the result into a signed int32_t, and return that.

So, if you pass a pointer to the array { 0xAA, 0xBB, 0xCC, 0xDD } to the function, it will combine them into a 32-bit integer with the most significant bytes of the integer coming from the lowest addresses in the array, giving you 0xAABBCCDD or -1430532899.

However, if the array pointed to by the argument bytes is not at least four bytes long, it has undefined behavior.

Govind Parmar
  • 20,656
  • 7
  • 53
  • 85