How do I read a little-endian 64-bit value from a byte buffer?

Question

In a C application (not C++) I have a byte array of data received over the network. The array is 9 bytes long. The bytes 1 to 8 (zero-based) represent a 64-bit integer value as little endian. My CPU also uses little endian.

How can I convert these bytes from the array to an integer number?

I've tried this:

uint8_t rx_buffer[2000];
//recvfrom(sock, rx_buffer, sizeof(rx_buffer) - 1, ...)
int64_t sender_time_us = *(rx_buffer + 1);

But it gives me values like 89, 219, 234 or 27. The sender sees the values as 1647719702937548, 1647719733002117 or 1647719743790424. (These examples don't match, they're just random samples.)

ygoe, Title is _read a little-endian 64-bit_ and your machine is _My CPU also uses little endian_. Since not all machines are the same endian, a portable solution would not depend on the local machine having the same endian. Too bad you were not seeking a portable answer as that would have made the Q&A more valuable. — chux - Reinstate Monica, Mar 20 '22 at 08:54
@chux-ReinstateMonica Why should I seek a portable answer when my code targets a single specific platform? I know the portable answer and it's quite long to bit-shift 8 bytes. I was looking for a quick and simple solution. — ygoe, Mar 20 '22 at 11:19
Even if you don’t care, someone who comes after you might. And the inconvenience is not even that big: you only need to write a helper function once. — user3840170, Mar 20 '22 at 16:17

ikegami · Accepted Answer · 2022-03-20T00:34:05.950

4

Unsafe solution:

int64_t sender_time_us = *(int64_t*)(rx_buffer + 1);

This is potentially an alignment violation, and it's a strict aliasing rule violation. It's undefined behaviour. On some machines, this can kill your program with a bus error.

Safe solution:

int64_t sender_time_us;
memcpy( &sender_time_us, rx_buffer + 1, sizeof( int64_t ) );

@Nate Eldredge points out that while this solution may look inefficient, a decent compiler should optimize this into something efficient. The net effect will be (a) to force the compiler to properly handle the unaligned access, if the target needs any special handling, (b) to make the compiler properly understand the aliasing and prevent any optimizations that would break it. For a target that is able to handle unaligned accesses normally, the generated code may not change at all.

edited Mar 20 '22 at 00:34

answered Mar 19 '22 at 20:51

ikegami

367,544
15
269
518

If you actually wanted to discourage the unsafe solution, you wouldn’t put it first. – user3840170 Mar 20 '22 at 05:39
Rather than maintain the _type_ and _object_, how about using the _object_ only? `memcpy( &sender_time_us, rx_buffer + 1, sizeof sender_time_us);` – chux - Reinstate Monica Mar 20 '22 at 08:43
If local machine is big endian, this would not read the buffer in a _little-endian 64-bit_ way. Perhaps OP does not care about portability. – chux - Reinstate Monica Mar 20 '22 at 08:45
No, I don't care about portability. See my reply comment in the question. – ygoe Mar 20 '22 at 11:21
@chux - Reinstate Monica The OP specifically said they were using a LE machine – ikegami Mar 20 '22 at 18:03

score 1 · Answer 2 · answered Mar 19 '22 at 20:07

1

Your code is only getting a single uint8_t. You would need to cast to int64_t first. Something like this:

int64_t* pBuffer = (int64_t*)(rx_buffer + 1);
int64_t sender_time_us = *pBuffer;

But you should be aware that some CPU's may not like to access 64-bit values that are not aligned. It may also be OK to this this if you know the endianess but it would actually be better to handle it in a more portable way.

answered Mar 19 '22 at 20:07

Jim Rhodes

5,021
4
25
38

1

This is identical to [the other answer](/a/71541672/3840170), which is a pointer alignment and/or a strict aliasing violation. – user3840170 Mar 20 '22 at 06:20
@user3840170 I posted my answer first. – Jim Rhodes Mar 20 '22 at 18:03

score 1 · Answer 3 · answered Mar 20 '22 at 06:23

The portable way to read a little-endian 64-bit value is very straightforward:

inline static uint64_t load_u64le(const void *p) {
    const unsigned char *q = p;
    uint64_t result = 0;
    result |= q[7]; result <<= 8;
    result |= q[6]; result <<= 8;
    result |= q[5]; result <<= 8;
    result |= q[4]; result <<= 8;
    result |= q[3]; result <<= 8;
    result |= q[2]; result <<= 8;
    result |= q[1]; result <<= 8;
    result |= q[0];
    return result;
}

inline static int64_t load_i64le(const void *p) {
    return (int64_t)load_u64le(p);
}

Simply invoke this helper function as read_i64le(rx_buffer + 1). Modern compilers are able to optimize this to a single instruction on architectures where that is possible.

To read a 64-bit value where you specifically know the endianness agrees with the native ABI, you can use this:

inline static uint64_t load_u64(const void *p) {
    uint64_t result;
    memcpy(&result, p, sizeof(result));
    return result;
}

which has even better chances of being optimized into a simple load, assuming only that the compiler optimizes a short memcpy into an inline memory load.

For best results then, you can use:

inline static uint64_t load_u64le(const void *p) {
    uint64_t result = 0;
#if defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
    memcpy(&result, p, sizeof(result));
#else
    const unsigned char *q = p;
    result |= q[7]; result <<= 8;
    result |= q[6]; result <<= 8;
    result |= q[5]; result <<= 8;
    result |= q[4]; result <<= 8;
    result |= q[3]; result <<= 8;
    result |= q[2]; result <<= 8;
    result |= q[1]; result <<= 8;
    result |= q[0];
#endif
    return result;
}

Now, why you shouldn’t cast an offset pointer like the other answers suggest: first of all, because dereferencing a misaligned pointer is UB. Not every architecture supports reading words wider than 8 bits from arbitrary addresses, and even on those architectures that do support them, the compiler may still make the assumption that all dereferenced addresses are properly aligned when generating code, especially under optimizations. If you ever run your code with UBSan, it will also complain.

The second reason is strict aliasing. The C language stipulates that all memory must be accessed either via a pointer to a character type (char, signed char or unsigned char) or a pointer to the type of which an object is stored in that memory; this ensures that pointers to different types can be assumed not to alias (point to the same memory). In practice, uint8_t is usually an alias of unsigned char, which is a character type, exceptionally allowed to alias any type; this makes the strict aliasing concern mostly theoretical, so far. Nevertheless, there is no reason to take that risk either, when avoiding it is so easy and cheap.

Nicely explained. – chux - Reinstate Monica Mar 20 '22 at 08:47 — chux - Reinstate Monica, Mar 20 '22 at 08:47

score -1 · Answer 4 · answered Mar 19 '22 at 20:09

-1

Tou need to cast your pointer, like so:

int64_t sender_time_us = *(int64_t*)(rx_buffer + 1);

As it is, you're only getting one byte of data.

answered Mar 19 '22 at 20:09

SGeorgiades

1,771
1
11
11

2

This is a strict aliasing violation, which I believe is undefined behaviour. One some machines, this can result in a fatal signal. In simpler terms, some machines care about alignment, and there's no indication `rx_buffer + 1` is suitably aligned for an `int64_t`. – ikegami Mar 19 '22 at 20:47
2

Strict aliasing and alignment are separate concerns, but this violates both of them. – user3840170 Mar 19 '22 at 20:52
Works perfectly on an ESP32 module. That's all I can say. I'd like to avoid copying too much stuff around. – ygoe Mar 19 '22 at 21:20
2

@ygoe *Works perfectly on an ESP32 module.* Until it doesn't for some reason. Have higher standards than writing code with UB in it and hoping it continues to work just because you've never observed it failing - **yet**. – Andrew Henle Mar 20 '22 at 00:06

How do I read a little-endian 64-bit value from a byte buffer?

4 Answers4