Efficient big endian byte array to native integer conversion in C

Question

1) Are there any compiler builtins, or assembly instructions for x86, ARM or another architecture that will take a big endian byte array (2 bytes -> uint16_t, 4 bytes -> uint32_t, 8 bytes -> uint64_t) and convert it to an unsigned integer of native endianness (big, little, mixed) .

2) Are there also any builtins or instructions to perform the inverse conversion (integer to big endian byte array).

Naive native C functions would be:

static inline void put_be16(uint8_t a[static sizeof(uint16_t)], uint16_t val)
{
    a[0] = (val >> 8) & 0xff;
    a[1] = val & 0xff;
}

static inline uint16_t get_be16(uint8_t const a[static sizeof(uint16_t)])
{
    return (a[0] << 8) | a[1];
}

These are for reading unsigned integers from inbound network packets, and encoding unsigned integers for use in outbound network packets.

Solutions must prevent, or mitigate unaligned memory accesses.

Edit: And looking to be efficient, so something that operates directly on the input/output buffer is what I'm really looking for.

Yes, most compilers have builtins. There are some headers that try to portably expose efficient versions across compilers, e.g. `be16toh()` on systems that have that function. — Peter Cordes, Sep 24 '19 at 21:59
Regarding `ntohs` and `htons`, In this case no. I'm trying to avoid the memcpy that'd be required for proper aligned access, which is a prerequisite of those functions/macros. There's also a pedantic corner of the internet which argues use of those functions/macros is wrong in the majority of cases when dealing with network packets. — Arran Cudbard-Bell, Sep 24 '19 at 22:00
Same issue with `be16toh()` unless it's guaranteed to produce aligned accesses. This is why the question is explicit about the input being a byte array for the 'from network' solution, and a byte array for the output of the 'to network' solution. — Arran Cudbard-Bell, Sep 24 '19 at 22:04
Those functions do not take pointers, they don't produce any accesses themselves. It's your job to load and store the input and output which is where alignment comes into play. — Jester, Sep 24 '19 at 22:07
Which involves a memcpy which as I stated, I'm trying to avoid. — Arran Cudbard-Bell, Sep 24 '19 at 22:07
No it does not necessarily involve memcpy - you do not need to copy the whole array, you just need to load and store the items one by one. Also, that's no fault of the functions. — Jester, Sep 24 '19 at 22:08
Note: `return (a[0] << 8) | a[1];` potentially is technically UB on 16-bit `int` machines. `return ((unsigned) a[0] << 8) | a[1];` is better. — chux - Reinstate Monica, Sep 24 '19 at 22:17
@Jester could you provide some example code that's architecture agnostic, I'm not sure I understand what you mean there. — Arran Cudbard-Bell, Sep 24 '19 at 22:17
@chux Could you explain why it's UB, not doubting, just interested to know. — Arran Cudbard-Bell, Sep 24 '19 at 22:19
`128 << 8` shifts into the sign bit: "The result of E1 << E2 is .... If E1 has a signed type and nonnegative value, and E1 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined" — chux - Reinstate Monica, Sep 24 '19 at 22:20
But isn't E1 an unsigned type in this instance being uint8_t? — Arran Cudbard-Bell, Sep 24 '19 at 22:27
Ugh, integer promotion is something I can never remember the rules for. I'd say `return ((uint16_t)a[0] << 8) | a[1];` was actually the correct code, but the current code does work as posted. — Arran Cudbard-Bell, Sep 24 '19 at 22:40
I think I misunderstood your problem, I thought you had a massive array of values to byte swap but apparently your array is only the size of a single item. Anyway, for the `get_xx` versions both gcc and clang seem to recognize the intent and optimize it to unaligned load and appropriate byte swapping if available on the architecture. — Jester, Sep 24 '19 at 22:59
That also holds for the `put_xx` except for 32 bit ARM gcc for which I couldn't make it work. — Jester, Sep 24 '19 at 23:04
However `val = htonl(val); memcpy(a, &val, 4);` does work there as well. It produces a `rev` + `str` pair. — Jester, Sep 24 '19 at 23:10
@ArranCudbard-Bell " but the current code does work as posted." --> if your tests only include a 32-bit `int/unsigned`, then yes it works. Even with 16-bit `int`, your code may "work", yet still rely on UB. — chux - Reinstate Monica, Sep 25 '19 at 03:27
@ArranCudbard-Bell Answers to Q1 & Q2 are both yes. "And looking to be efficient" is a worthy goal, yet true efficiency comes with looking at the larger picture, not this narrow micro-optimization goal. To really get speed performance, post your best true code with a timing assessment and ask for how to improve its speed. — chux - Reinstate Monica, Sep 25 '19 at 03:33

score 0 · Answer 1 · answered Sep 24 '19 at 22:14

0

As far as the C side, I think this has been asked before. I found this with a quick search: Convert Little Endian to Big Endian

You can use htonl(), htons(), and related, which will convert to and from a known big-endian format. You can also use a union to extract the individual bytes from an int or long, etc., as in:

union extract_byte_val {
    long long_val;
    uint8_t bytes_val[sizeof(long)];
}

For example:

long x = 128;
union extract_byte_val eb;
eb.long_val = htonl(x);

eb.bytes_val is now {0, 0, 0, 128}, since eb.long_val is bug-endian.

answered Sep 24 '19 at 22:14

John Bayko

746
4
7

AFAIK, you can't use unions to parse network packets like that. On systems where unaligned accesses are fatal, you'll kill your program. Happy to be proved wrong. – Arran Cudbard-Bell Sep 24 '19 at 22:16
1

and this isn't a duplicate of the question you posted. – Arran Cudbard-Bell Sep 24 '19 at 22:18
I think the compiler would re-align things, but that would slow things down, so I see your point. I can't think of any modern CPUs that handle non-aligned integers, they all assume that you can manage to get the data aligned into memory somehow first. – John Bayko Sep 24 '19 at 22:19
I thought the answers to the question might give you some ideas. – John Bayko Sep 24 '19 at 22:20
Note that `htonl()` is not a C standard function. If it takes an `int/unsigned/uin32_t/uint16_t/...` is implementation defined. I'd expect most implementations to takes some unsigned argument, yet the width can vary. `htonl32()` and the like are more predictable. – chux - Reinstate Monica Sep 24 '19 at 22:24
Unfortunately even modern compilers aren't that smart. There's only a limited subset of operations/builtin functions which guarantee aligned accesses, of which `memcpy` is one. That's why casting a union over a buffer containing a network packet to extract fields at fixed offsets, if those fields are > 8bits is bad practice if you're targeting an unknown architecture. – Arran Cudbard-Bell Sep 24 '19 at 22:24
1

@chux `htonl()` is defined by POSIX to take a 32 bit unsigned int. – Shawn Sep 24 '19 at 22:29
@Shawn Limiting ourselves to POSIX is not clearly required here. – chux - Reinstate Monica Sep 24 '19 at 22:30
1

@chux The only non-POSIX conforming OS anyone is likely to use is Windows, and Winsock also specifies that it works with unsigned 32 bit ints. Pretending it might work with a different size of integer is pretty silly these days. – Shawn Sep 24 '19 at 22:33
1

@Shawn Billions of processors/year are embedded ones. Large number of those do not have any OS. C is designed to be highly portable across old and new/novel architectures. `htonl()` inheritable implies `host-to-network-long` yet many `long` are now 64-bit. OP is looking for high portability given the "big, little, mixed". [htobe16(3) - Linux man page](https://linux.die.net/man/3/htobe16) are a better less ambiguous function set design. – chux - Reinstate Monica Sep 24 '19 at 22:40
1

@chux Name an embedded system that provides a `htonl()` that doesn't work on 32 bit values. I'll be honestly very surprised if such a thing exists in modern times. Plus does OP say anything about working in an embedded environment? (I used to feel the same way about `htonl()` until it was pointed out to me that it's standardized to 32 bit values. At that point, why be pedantic when you can be practical?) – Shawn Sep 24 '19 at 22:48
The issue with `htonl()` is more about needing to copy the bytes to an intermediary location, so it's more to do with the efficiency that the portability. – Arran Cudbard-Bell Sep 24 '19 at 22:58
@Shawn Using `htonl()` and friends 1) lack a 64-bit version. 2) lack a clear naming to wider types. 3) refer to network order rather than clearly "big endian" and explicitly requested by OP. OP did not mention an OS in the question either. Using `htobeNN()` is not a pedantic concern, but a clear future thinking approach as many newer functions tend to use `NN` especially anything in communication. There is a good practical reason why *nix rolled these out for clarity. – chux - Reinstate Monica Sep 24 '19 at 23:15
@Shawn Give you the option for last word, else we should move this to chat. – chux - Reinstate Monica Sep 24 '19 at 23:15
@ArranCudbard-Bell: you can use `memcpy` to express an unaligned load into an object. It will optimize away on machines that support unaligned loads, but unfortunately can compile horribly (an actual call to the libc function) on ISAs like MIPS. In GNU C, you can use `typedef int unaligned_int __attribute((aligned(1)))` or something; see [Why does glibc's strlen need to be so complicated to run quickly?](//stackoverflow.com/a/57676035). So only non-GNU-compatible compilers would need the memcpy fallback. – Peter Cordes Sep 24 '19 at 23:24
1

@Shawn: “You cannot show me a platform this does not work on, so I will assume it works” is not a valid engineering process. It makes things blow up and fall down. Engineering proceeds by constructing proper designs from given specifications, not from hopes that not only are there no counterexamples at hand now but that none exist at all and will not exist in the future. Knowingly designing out of specification is malpractice. – Eric Postpischil Sep 25 '19 at 00:07
"That's why casting a union over a buffer containing a network packet to extract fields at fixed offsets, if those fields are > 8bits is bad practice" - I'd consider casting in general to be bad practice, so it didn't occur to me you'd think of doing that. – John Bayko Sep 25 '19 at 23:29
I take it you can't control how the input data is aligned. I suppose the best you could do in that case is split your integer in two, based on the lower 4 bits of the address - extract the top 1-3 bytes of one, lower 3-1 bytes of the other, mask, shift, and combine the resulting integers into the one you want. I don't know how much time you'd save over the complexity of something like that, but that's up to you to decide. – John Bayko Sep 25 '19 at 23:30

score 0 · Answer 2 · answered Sep 24 '19 at 22:18

0

I think you can do it like that

int swap_int(unsigned char *byte) {
unsigned char temp;

temp = byte[3];
byte[3] = byte[0];
byte[0] = temp;
temp = byte[2];
byte[2] = byte[1];
byte[1] = temp;}

answered Sep 24 '19 at 22:18

CharlesRA

375
2
8

Unfortunately that only works for big to little endian conversion. – Arran Cudbard-Bell Sep 24 '19 at 22:19
(and vice versa). – Arran Cudbard-Bell Sep 24 '19 at 22:27
@ArranCudbard-Bell: You can detect whether your implementation is big or little endian using portable C (which should optimize away): access the bytes of a `const uint16_t testval = 0x0001`. But I wouldn't count on optimizers turning this C back into a `bswap` or `movbe` instruction. If you care about performance across compilers, you may need to use intrinsics. – Peter Cordes Sep 24 '19 at 23:21

Efficient big endian byte array to native integer conversion in C

2 Answers2