0

Usually a uint64_t or a uint32_t/uint16_t etc can be retrieve from a char* buf as follows:

uint32_t val = *(uint32_t*) buf;

But now suppose buf is char [6], how would one retrieve a numerical value from it?

*Unsigned big endian (network byte order)

Palace Chan
  • 8,845
  • 11
  • 41
  • 93

3 Answers3

2

A portable and standard-conforming way (in contrast to pointer casting or memcpy) would be to make it explicit:

uint64_t val = 0;
for (int i = 0; i < 6; ++i)
    val |= (uint64_t)(unsigned char)buf[i] << (8*(6-i-1));

This assumes big-endianess (network byte order). The extra cast to unsigned char is a hack that you would not need if your input array was already of type unsigned char*.

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • You can get around the possibility of `char` being a sign-magnitude or one's-complement representation by doing a pointer cast `((unsigned char*)buf)` instead of a value cast. – Potatoswatter Feb 28 '14 at 17:23
  • @Potatoswatter: I'm pretty sure that's implementation defined – Niklas B. Feb 28 '14 at 17:23
  • Nope, you can cast any pointer to `char *` or `unsigned char *` for object representation inspection. That's why `memcpy` works without violating the aliasing rule. – Potatoswatter Feb 28 '14 at 17:24
  • @Potatoswatter: But you don't know where that `char*` came from and whether it actually *is* the object representation that the parsing machine would give you. Or is the object representation defined in the standard? – Niklas B. Feb 28 '14 at 17:26
  • "Object representation" is a term used in the standard, meaning the way any particular object is stored as bytes in memory. – Potatoswatter Feb 28 '14 at 17:27
  • @Potatoswatter: Casting signed char to unsigned is definitely well-defined (if the signed char is negative, UINT8_MAX + 1 is added). – Niklas B. Feb 28 '14 at 17:28
  • @Potatoswatter: But that representation could vary between machines. In particular you have no guarantee that the parsing machine uses the exact internal representation that the char buffer contains (that one could come from anywhere!) – Niklas B. Feb 28 '14 at 17:28
  • Yes, but the other way is not, and the signed values of unsigned bytes sent over the network/filesystem could be surprising on an esoteric machine. For example, two values may both map to zero. Casting the pointer and reading the memory as `unsigned char` avoids any confusion. – Potatoswatter Feb 28 '14 at 17:29
  • @Potatoswatter: I think the premise here is that the bytes represent unsigned 8-bit numbers, representing in turn digits in base-256. Otherwise the whole thing wouldn't make any sense. – Niklas B. Feb 28 '14 at 17:30
  • Suffice to say, no method is without its gotchas, and the Posix committee should *really* get off its butt and standardize `htonll` etc :) . – Potatoswatter Feb 28 '14 at 17:31
0

Using uint64_t val = 0; memcpy(&val, buf, 6);

but beware of endianness issues (this one works for little endian val and buf).

An alternate way avoiding ugly casts would be with a union:

union {
   uint64_t u64;
   uint8_t c[8];
} foo;

Write the bytes in the proper order to foo.c[] and then access foo.u64. Not 100% ISO C kosher, but does the right thing on most modern C implementations.

Jens
  • 69,818
  • 15
  • 125
  • 179
0

I won't ask how you ended up with a big-endian, six-byte buffer.

const int odd_buffer_size = 6;

char src[ odd_buffer_size ] = { … };
uint64_t dst = 0;

    // Copy the big-endian data into a big-endian long:
memcpy( ( (char*) & dst ) + 2, src, odd_buffer_size );
    // Read the data as a value (aliasing-safe) and convert endianness:
dst = ntohll( dst );

ntohll is not standard C, but is available in Windows and sometimes on Linux. Names for it seem to vary across platforms (e.g. be64toh), but some facility is always available.


By the way, the trick uint32_t val = *(uint32_t*) buf; is unsafe because the buffer may be (in this particular case, almost certainly is) improperly aligned for accessing a uint32_t value.

Even forming a value of type uint32_t * with an odd address is enough to potentially crash in C. Always use memcpy or a union when reinterpreting bytes.

Community
  • 1
  • 1
Potatoswatter
  • 134,909
  • 25
  • 265
  • 421