C pointer conversions

Question

Let's suppose I have a buffer of bytes:

char buffer[] = {  0x11, 0x00, 0x00, 0x02, .... };

What I want to do is to take a qword from a specific offset. The offset will be calculated in bytes. Here is what I have done:

unsigned long long *ptr = &buffer[16];
unsigned long long myqword = *ptr;

It seems to works for me. I am skipping 16 bytes from the beginning and then taking 8 bytes.

My question is why do I get this warning message:

warning: initialization of ‘long long unsigned int *’ from incompatible pointer type ‘char *’ [-Wincompatible-pointer-types]
   unsigned long long *ptr = &buffer[16];

Note you are [violating the strict aliasing rule](https://stackoverflow.com/a/51228315/1708801) — Shafik Yaghmour, Sep 18 '18 at 08:35
so what should i do to make it work without violating rules ? — Bob5421, Sep 18 '18 at 08:35
@Observer Please don't answer in comments. Cast is the worst option: it will hide the warning, but it won't fix the strict aliasing violation. — user694733, Sep 18 '18 at 08:36

Lundin · Answer 1 · 2018-09-18T11:26:55.860

There's 4 major problems here:

unsigned long long* is not compatible with char*. This is the cause for the compiler error.
Reading a bunch of char through a unsigned long long* would violate the strict aliasing rule, invoking undefined behavior bugs.
Reading a bunch of char through a unsigned long long* may also cause misaligned access, invoking undefined behavior bugs.
Using char to store integer values is a very bad idea, since char has implementation-defined signedness and could be signed on some computers. Use uint8_t instead.

What you should do instead is to use memcpy:

size_t n = sizeof(unsigned long long);
memcpy(&myqword, &buffer[i * n], n);

If you actually need to access the contents of a specific memory location without copying it, you have no other option but to make some new type, like a union:

#include <inttypes.h>
#include <stdio.h>

typedef union
{
  uint8_t  byte [8];
  uint64_t qword;
} qword_t;

int main (void)
{
  qword_t q = {.byte = {0x11, 0x00, ...} };
  printf("%"PRIu64, q.qword);
}

But please note that this code depends on CPU endianess and is non-portable.

nos · Answer 2 · 2018-09-18T08:59:47.590

You would piece together the bytes into an integer yourself.

If you are on a big endian host and your current code did get the proper integer value, use this one:

/** De-serialize a uint64_t from a byte array in big endian format.
 * @param r The byte array to containing the integer in. Must be at least of size 4.
 * @param return The deserialized integer from the byte array
 */
static inline uint64_t uc_unpack_64_be(const uint8_t *r)
{
    uint64_t v;

    v  = (uint64_t)r[0] << 56;
    v |= (uint64_t)r[1] << 48;
    v |= (uint64_t)r[2] << 40;
    v |= (uint64_t)r[3] << 32;
    v |= (uint32_t)r[4] << 24;
    v |= r[5] << 16;
    v |= r[6] << 8;
    v |= r[7];

    return v;
}

If you are currently on a little endian machine, use this one:

/** De-serialize a uint64_t from a byte array in little endian format.

 * @param r The byte array to containing the integer in. Must be at least of size 8.
 * @param return The deserialized integer from the byte array
 */
static inline uint64_t uc_unpack_64_le(const uint8_t *r)
{
    uint64_t v;

    v  = r[0];
    v |= r[1] << 8;
    v |= r[2] << 16;
    v |= (uint32_t)r[3] << 24;
    v |= (uint64_t)r[4] << 32;
    v |= (uint64_t)r[5] << 40;
    v |= (uint64_t)r[6] << 48;
    v |= (uint64_t)r[7] << 56;

    return v;
}

Use it e.g. as uint64_t myqword = uc_unpack_64_le(&buffer[16]);

Note that whether you use one of uint64_t uc_unpack_64_le or uint64_t uc_unpack_64_le functions depends on whether you formatted the data in your buffer as little or big endian, not if your code currently runs on a little or big endian machine.

If you insist on using your current long long and char types, change the code accordingly, but I encourage you to use the uint16_t and uint64_t types from the <stdint.h> header instead.

I would cast *all* elements to target type before shift to make code more portable. `r[1] << 8` might shift to sign bit, which is UB if `int` is 16-bits on system. — user694733, Sep 18 '18 at 08:52
Also it's not really necessary to use type correctness of the right operand of the shift operators. Shift operators don't use "the usual arithmetic conversions", but instead the result is always that of the integer promoted left operand. So the `ULL` part is just confusing the reader. — Lundin, Sep 18 '18 at 08:56

bobah · Answer 3 · 2018-09-18T08:46:30.397

1

Apart from the strict aliasing rule violation (covered well in the comments to the questions), another reason this warning is worth paying attention to is because you may end up accessing unaligned data.

On some architectures it would just be slower, on others your app may not survive it.

If you cast it explicitly it will shut up. But I would also tell compiler to align the char array to the size of unsigned long long (and also I would switch to uint8_t and uint64_t for clarity.

edited Sep 18 '18 at 08:46

answered Sep 18 '18 at 08:38

bobah

18,364
2
37
70

1

This answer doesn't address the strict aliasing violation, which is the worst problem. – user694733 Sep 18 '18 at 08:44
1

@user694733 - I wouldn't steal the answer from someone else's comments. But thanks for pointing out, I added a note that this answer is in addition to the strict aliasing violation covered in the comments. – bobah Sep 18 '18 at 08:47

C pointer conversions

3 Answers3