0

I'm trying to create a complete uint32_t using vector of uint8_t bytes. It should be filled iteratively. It can happen in any of the following ways:

  • 1 byte and 3 bytes.
  • 2 bytes and 2 bytes.
  • 4 bytes.
  • 3 bytes and 1 byte.
    uint32_t L = 0;
    uint32_t* LPtr = &L;
    std::vector<uint8_t> data1 = {0x1f, 0x23};
    memcpy(LPtr, data1.data(), 2);
    EXPECT_EQ(0x231f, L);

Above works fine (first two bytes). But following is not (with the two sets of bytes).

    uint32_t L = 0;
    uint32_t* LPtr = &L;
    std::vector<uint8_t> data1 = {0x1f, 0x23};
    std::vector<uint8_t> data2 = {0x3a, 0xee};
    memcpy(LPtr, data1.data(), 2);
    memcpy(LPtr, data2.data(), 2);
    EXPECT_EQ(0x231f, L);
    EXPECT_EQ(0x231fee3a, L);

The issue I feel is LPtr does not point to the next byte that should be filled next. I tried LPtr+2 which is not pointing to individual byte of uint32_t.

This should be done using memcpy and output should go to uint32_t. Any idea to get this sorted?

Endianness is not an issue as of now. It can be corrected once the uint32_t is completed (when eventually 4 bytes get copied).

Any help is appreciated!

Amila Senadheera
  • 12,229
  • 15
  • 27
  • 43
  • 2
    `memcpy(reinterpret_cast(LPtr) + 2, data1.data(), 2);` – Chris Dodd Aug 13 '22 at 17:40
  • 1
    The second call to `memcpy` overwrites the first data, because you pass the exact same destination pointer. – Some programmer dude Aug 13 '22 at 17:41
  • 3
    You may want using bit shifts instead of memcpy. – 273K Aug 13 '22 at 17:42
  • 1
    Have you looked up how your CPU stores data in memory? The bytes might not be in the order you expcect. https://en.wikipedia.org/wiki/Endianness – Pepijn Kramer Aug 13 '22 at 17:43
  • 1
    Instead of memcpy have a look at std::copy, it is a bit more type safe. And will in practice compile to assembly that is at least as good as memcpy. – Pepijn Kramer Aug 13 '22 at 18:04
  • Even if you get the copy right, you still have the problem of endianess. In short, some architectures stores the low byte first in memory, others the high byte. By using arithmetics (shifts etc.) you can write code that runs on all architectures and typically is faster than code that use `memcpy`. – Lindydancer Aug 13 '22 at 19:13
  • Bitcasting to an `std::array` may also work https://stackoverflow.com/questions/58320316/stdbit-cast-with-stdarray – Sebastian Aug 13 '22 at 20:37
  • 1
    So what is the problem? This is basic boolean algebra. For little endian CPU (Intel/AMD and ARM with Android and iOS) `uint32_t ret = (data[0] << 24) | (data[1] << 16) | (data[2] <<8) | data[3];` for big endian CPU (i.e. PowePC, Ultra Spark) `uint32_t ret = (data[0] >> 24) | (data[1] >> 16) | (data[2] >> 8) | data[3];` – Victor Gubin Aug 13 '22 at 21:51
  • P.S. About [Endianness](https://en.wikipedia.org/wiki/Endianness) – Victor Gubin Aug 13 '22 at 21:54

1 Answers1

2

The problem is you're using a pointer to uint32_t so incrementing it won't make it iterate by 1 byte, only by 4 bytes. Here is a version which populates all bytes of L, but it's still messing with endianness:

uint32_t gimmeInteger(std::vector<uint8_t> data1, std::vector<uint8_t> data2)
{
  assert((data1.size() == 2));
  assert((data2.size() == 2));
  uint32_t L = 0;
  uint8_t* LPtr = reinterpret_cast<uint8_t*>(&L);
  memcpy(LPtr, data1.data(), 2);
  memcpy(LPtr+2, data2.data(), 2);
  return L;
}
Bartosz Charuza
  • 491
  • 1
  • 7
  • 1
    One could mention that this is one of the few instances, where `reinterpret_cast` is used legally without aliasing trouble. (In general cases for unrelated types only possible for `std::byte`, `char` and `unsigned char` - `uint8_t` should be one of those - but e.g. not for `signed char`.) https://en.cppreference.com/w/cpp/language/reinterpret_cast – Sebastian Aug 14 '22 at 05:24