Given that you have a uint16
there, a lookup table. That's a one liner for the reverse, of course the initialization takes some more work. It's not necessarily the fastest way either since that's a pretty big table and you'll probably index into it relatively randomly which is liable to cause cache misses. Also still simple, and a lot less space-hungry: a lookup table that reverses bytes. Use it the obvious way:
return (bitreverse[x & 0xFF] << 8) | bitreverse[x >> 8];
Of course you still need to deal with the initialization, but speed matters less there.
There is an (in my opinion) more elegant but a bit more complicated way using only some bitwise operations (and no totally incomprehensible remainder operations or anything like that), reversing first adjacent bits, then adjacent 2-bit pieces, then adjacent nibbles, and so on doubling the width of the reversed thing every step until in the end all the bits are reversed "globally". You can reorder these steps by the way, and there is a way to write them with one operation less but also less ILP.
x = ((x & 0x5555) << 1) | ((x >> 1) & 0x5555); // swap bits
x = ((x & 0x3333) << 2) | ((x >> 2) & 0x3333); // swap 2-bit pieces
x = ((x & 0x0f0f) << 4) | ((x >> 4) & 0x0f0f); // swap nibbles
return (x >> 8) | (x << 8); // optimized last step