@Lundin's answer shows a pure-C shift/mask bithack that clang recognizes and compiles to a single rev
instruction. (Or presumably to x86 bswap
if targeting x86, or equivalent instructions on other ISAs that have them.)
In portable ISO C, hoping for pattern-recognition is unfortunately the best you can do, because they haven't added portable ways to expose CPU functionality; even C++ took until C++20 to add the <bit>
header for things like std::popcount
and C++23 std::byteswap
.
(Some fairly-portable C libraries / headers have byte-reversal, e.g. as part of networking there's ntohl
net-to-host which is an endian-swap on little-endian machines. Or there's GCC's (or glibc's?) endian.h
, with htobe32
being host to big-endian 32-bit. Man page. These are usually implemented with intrinsics that compile to a single instruction in good-quality implementations.
Of course, if you definitely want a byte swap regardless of host endianness, you could do htole32(be32toh(x))
because one of them's a no-op and the other's a byte-swap, since ARM is either big or little endian. (It's still a byte-swap even if neither of them are NOPs, even on PDP or other mixed-endian machines, but there might be more efficient ways to do it.)
There are also some "collections of useful functions" headers with intrinsics for different compilers, with functions like byte swap. These can be of varying quality in terms of efficiency and maybe even correctness.
You can see that no, neither GCC nor clang optimize your code to rbit
for ARM or AArch64. https://godbolt.org/z/Y7noP61dE . Presumably looping over bits in the other direction isn't any better. Perhaps a bithack as in In C/C++ what's the simplest way to reverse the order of bits in a byte? or Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C .
CC and clang recognize the standard bithack for popcount, but I didn't check any of the answers on the bit-reverse questions.
Some languages, notably Rust, do care more about making it possible to portably express what modern CPUs can do. foo.reverse_bits()
(since Rust 1.37) and foo.swap_bytes()
just work for any type on any ISA. For u32
specifically, https://doc.rust-lang.org/std/primitive.u32.html#method.reverse_bits (That's Rust's equivalent of C uint32_t
.)
Most mainstream C implementations have portable (across ISAs) builtins or (target-specific) intrinsics (like __REV()
or __REV16()
for stuff like this.
The GNU dialect of C (GCC/clang/ICC and some others) includes __builtin_bswap32(input)
. See Does ARM GCC have a builtin function for the assembly 'REV' instruction?. It's named after the x86 bswap
instruction, but it's just a byte-reverse that GCC / clang compile to whatever instructions can do it efficiently on the target ISA.
There's also a __builtin_bswap16(uint16_t)
for swapping the bytes of a 16-bit integer, like revsh
except the C semantics don't include preserving the upper 16 bits of a 32-bit integer. (Because normally you don't care about that part.) See the GCC manual
for the available GNU C builtins that aren't target-specific.
There isn't a GNU C builtin or intrinsic for bitwise reverse that I could find in the manual or GCC arm-none-eabi 12.2 headers.
ARM documents an __rbit()
intrinsic for their own compiler, but I think that's Keil's ARMCC, so there might not be any equivalent of that for GCC/clang.
@0___________ suggests https://github.com/ARM-software/CMSIS_5 for headers that define a function for that.
If worst comes to worst, GNU C inline asm
is possible for GCC/clang, given appropriate #ifdef
s. You might also want if (__builtin_constant_p(x))
to use a pure-C bit-reversal so constant-propagation can happen on compile-time constants, only using inline asm on runtime-variable values.
uint32_t output, input=...;
#if defined(__arm__) || defined (__aarch64__)
// same instruction is valid for both
asm("rbit %0,%1" : "=r"(output) : "r"(input));
#else
... // pure C fallback or something
#endif
Note that it doesn't need to be volatile
because rbit
is a pure function of the input operand. It's a good thing if GCC/clang are able to hoist this out of a loop. And it's a single asm instruction so we don't need an early-clobber.
This has the downside that the compiler can't fold a shift into it, e.g. if you wanted a byte-reverse, __rbit(x) >> 24
equals __rbit(x<<24)
, which could be done with rbit r0, r1, lsl #24
. (I think).
With inline asm I don't think there's a way to tell the compiler that a r1, lsl #24
is a valid expansion for the %1
input operand. Hmm, unless there's a machine-specific constraint for that? https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html - no, no mention of "shifted" or "flexible" source operand in the ARM section.
Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C shows an #ifdef
ed version with a working fallback that uses a bithack to reverse bits within a byte, then __builtin_bswap32
or MSVC _byteswap_ulong
to reverse bytes.