Optimal and portable conversion of endian in c/c++

Question

Given a binary file with 32-bit little-endian fields that I need to parse, I want to write parsing code that compiles correctly independent of endianness of machine that executes that code. Currently I use

uint32_t fromLittleEndian(const char* data){
  return uint32_t(data[3]) << (CHAR_BIT*3) |
         uint32_t(data[2]) << (CHAR_BIT*2) |
         uint32_t(data[1]) << CHAR_BIT |
         data[0]; 
}

this, however generate inoptimal assembly. On my machine g++ -O3 -S produces:

_Z16fromLittleEndianPKc:
.LFB4:
    .cfi_startproc
    movsbl  3(%rdi), %eax
    sall    $24, %eax
    movl    %eax, %edx
    movsbl  2(%rdi), %eax
    sall    $16, %eax
    orl %edx, %eax
    movsbl  (%rdi), %edx
    orl %edx, %eax
    movsbl  1(%rdi), %edx
    sall    $8, %edx
    orl %edx, %eax
    ret
    .cfi_endproc

why is this happening? How could I convince it to produce optimal code when compiled on little endian machines:

_Z17fromLittleEndian2PKc:
.LFB5:
    .cfi_startproc
    movl    (%rdi), %eax
    ret
    .cfi_endproc

which I have gotten by compiling:

uint32_t fromLittleEndian2(const char* data){
    return *reinterpret_cast<const uint32_t*>(data);
}

Since I know my machine is little-endian, I know that above assembly is optimal, but it will fail if compiled on big-endian machine. It also violates strict-aliasing rules, so if inlined it might produce UB even on little endian machines. Is there a valid code that will be compiled to optimal assembly if possible?

Since I expect my function to be inlined a lot, any kind of runtime endian detection is out of the question. The only alternative to writing optimal C/C++ code is to use compile time endian detection, and use templates or #defines to fall back to the inefficient code if target endian is not little-endian. This however seems to be quite difficult to be done portably.

You can't match `reinterpret_cast`. It isn't doing any byte reordering. If you have to dance the endian byte shuffle, you have to pay the band. — user4581301, Apr 15 '16 at 22:39
The thing is that if my compile-target platform is little endian then I don't need byte shuffle - compiler should also know that, but it produces byte shuffled code anyway. — j_kubik, Apr 15 '16 at 22:42
Thing is the compiler doesn't know you're flipping endian. It just sees a bunch of shifts and ors. Would be a nice trick to have, though. Could play down at the makefile level and compile and link in the correct function, but that'll kill any inlining. — user4581301, Apr 15 '16 at 22:50
Given that you are parsing files won't calling something like `htonl()` be insignificant compared to the actual time you spend reading data from the `HDD`? — Galik, Apr 15 '16 at 22:52
@Galik I could but I don't want to link my code to inet just for this little thing. My `fromLittleEndian` works well, and probably quicker than anything involving calling `hton();` and freinds. And hdd throughput is likely to be much slower, I realise that. It's just that it's bugging me that I cannot get optimal assembly - This feels like something that should have been solved ages ago ;) — j_kubik, Apr 15 '16 at 22:58
@user4581301 The compiler doesn't need to know I am flipping endian. Given that it knows target-machne-endian, it should be able to tell that above code is equivalent to `reinterpret_cast`. Or do I expect too much from optimizer? — j_kubik, Apr 15 '16 at 23:00
AFAICT you cannot know via templates - the only way to find out endianness is essentially by reinterpreting the data through a pointer of different type, and that's not allowed in templates. Personally, I'd just use some `#define` provided by the compilers you are willing to support (and maybe some compiler intrinsic to swap the bytes); gcc provides `__BYTE_ORDER__` and `__bswap_32`, the other compilers will have something similar. Even better, you can just use boost.Endian and delegate the problem of dealing with the various compilers to them. — Matteo Italia, Apr 15 '16 at 23:01
I agree. A smart enough compiler should be able to generate a lovely tree, shuffle it down to it's minimum, pretty much what you have there, and then do a walk through the logic to see that nothing whatsoever happened and essentially throw the whole thing out in favor of a copy. But it doesn't look like we are there yet. — user4581301, Apr 15 '16 at 23:07
By the way, about the "probably quicker than `hton`": at least on gcc on Linux, [I wouldn't bet on it](https://godbolt.org/g/8ZY63O); the code generated for `htonl` is probably optimal, the one with the naive shifts - I wouldn't say so. — Matteo Italia, Apr 15 '16 at 23:11
@j_kubik : gcc is smart enough to turn `htonl` into a single `bswap` instruction on x86. There's no beating that, no matter how much templates you try to throw at the problem. If you really want to, you can wrap `hton(x)` functions in your own ones and provide inline optimized implementations for other not-so-smart compilers. — Daniel Kamil Kozar, Apr 15 '16 at 23:19
@MatteoItalia Nice to know - didn't expect that. The only problem is that network order is big endian. — j_kubik, Apr 15 '16 at 23:26
@j_kubik : Then that's what `` is for (man 3 endian), though it's nonstandard. — Daniel Kamil Kozar, Apr 15 '16 at 23:27
@DanielKamilKozar I found a nice discussion of `` on gcc mailing list: https://gcc.gnu.org/ml/gcc-help/2007-07/msg00342.html - it is a bit dated, but I don't think much has changed, and they don't even go as far as other compilers there. Since I already use autotools, I should probably just defer the problem to it. Still, it feels like going around the problem; compiler alone shold be able to deliver this info. — j_kubik, Apr 15 '16 at 23:36
A [very similar question](http://stackoverflow.com/a/36584577/224132) was asked recently about writing endian-agnostic code that compiled non-horribly. It's fiddly to accomplish for code that gcc and clang can both optimize to `bswap` or `movbe`. The most reliable way for compilers that support GNU C seems to be to use the `htobe64` or `htole32` functions that are wrappers around `__builtin_bswap64` and similar. Portable fallbacks are possible for other compilers. — Peter Cordes, Apr 17 '16 at 01:44

score 2 · Answer 1 · answered Apr 15 '16 at 23:59

2

short answer - use htonl - its gonna be optimized up the wazzoo

answered Apr 15 '16 at 23:59

pm100

48,078
23
82
145

3

The only problem is that network order is big endian. – j_kubik Apr 16 '16 at 00:06
yup and htonl will know that and convert or not depending on the machine its running on – pm100 Apr 18 '16 at 16:02
1

I know that, but `htonl` and friends are always converting from/to machine endian to/from big endian (network endian). My file is by definition little-endian, and I need a fuction set converting from/to machine endian to/from little endian. There is no way I see I could use `htonl` or `ntohl` to solve my problem, except maybe to convert always to big endian and then always do some byte shuffling anyway. This is not likely to be close to optimal. – j_kubik Apr 19 '16 at 02:21

score 1 · Accepted Answer · answered Apr 15 '16 at 23:49

1

Various platform libraries that I know of do this by #defining macros for the endian-swapping routines, based on the value of #define BIG_ENDIAN. In the cases where the source endianness matches your target endianness, you can just:

#ifdef LITTLE_ENDIAN
    #define fromLittleEndian(x) (x)
#else
    #define fromLittleEndian(x) _actuallySwapLittle((x))
#endif

For example:

http://man7.org/linux/man-pages/man3/endian.3.html

http://fxr.watson.org/fxr/source/sys/endian.h

answered Apr 15 '16 at 23:49

Mark Bessey

19,598
4
47
69

`` doesn't seem to be portable. See https://gcc.gnu.org/ml/gcc-help/2007-07/msg00342.html – j_kubik Apr 15 '16 at 23:55
You have to choose here - optimal or portable. @j_kubik has a portable version that's non-optimal. Various other answers will suggest other techniques that are more-or-less portable or optimal, but the only way to ensure that you get output that does nothing in the do-nothing case is to use the preprocessor. There are no guarantees that any given C++ compiler will recognize the do-nothing case otherwise. – Mark Bessey Apr 16 '16 at 00:24
I guess an answer is to attempt to detect target platform, and if unsuccessful, then use suboptimal code as fall-back. – j_kubik Apr 16 '16 at 11:33
1

The most recent versions of BOOST has an endian library, which I believe is highly portable. – Zyx 2000 Apr 16 '16 at 12:17

Optimal and portable conversion of endian in c/c++

2 Answers2

Linked