Most Efficient Algorithm for Bit Reversal with undefined size of data

Question

What is the fastest algorithm to achieve the following:

1010 0000 => 0000 0101

The conversion is from MSB LSB to LSB MSB. All bits must be reversed but the hard part is that the size of the data is between 1 byte and 8 byte and I can't know it in advance.That mean I can not predict the size. The frame arrives with a byte that indicates the size of the frame. the size varies between 1 byte and 8.

[This](https://graphics.stanford.edu/~seander/bithacks.html) may help — Jabberwocky, Apr 20 '18 at 14:05
"What is the best" Define best. Fastest? Most readable? Least ROM/RAM memory consuming? — Lundin, Apr 20 '18 at 14:20
"best in order to get the correct result in the least time" --> use a look-up table. It would be quite large (2^64). IOWs, best in the least time is not a reasonable criteria. Programming often requires balancing competing goals. VTC as too broad/unclear for a quality answer. — chux - Reinstate Monica, Apr 20 '18 at 14:31
@chux So what it seems the best solution to have a good balancing competing goals. (another quetion what do you mean by VTC) — gerrard2461, Apr 20 '18 at 14:38
Possible duplicate of [Most Efficient Algorithm for Bit Reversal ( from MSB->LSB to LSB->MSB) in C](https://stackoverflow.com/questions/746171/most-efficient-algorithm-for-bit-reversal-from-msb-lsb-to-lsb-msb-in-c) — , Apr 20 '18 at 14:40
@Ivan Its not the same thing because the size of data is not fixe it from the begining — gerrard2461, Apr 20 '18 at 14:42
@gerrard2461 There is no difference. As I mentioned in my answer, just use 64-bit version. This will probably be the fastest as it avoids branching. — , Apr 20 '18 at 14:43
@Ivan In the case that the data long is 5 Byte it did not work you can try it — gerrard2461, Apr 20 '18 at 14:45
@gerrard2461 And what is the difference between 5 and 8 bytes? The only difference is that result will be shifted. Reverse, then shift. — , Apr 20 '18 at 14:46
@Ivan: But i dont know the data size from the begining, i can't guess if it is 5,7 or 8. It s not a matter to handle data with the same function ? — gerrard2461, Apr 20 '18 at 14:52
@gerrard2461 Handling data with the same function in many cases is the fastest way - it avoids branching. Depending on your data and how branches are getting predicted by your cpu, branches can be expensive. Bit operations on a superscalar out of order cpu are not. — , Apr 20 '18 at 14:53
@Ivan: if you reverse `0x0F` as 8 bits, you get `0xF0`, but if you reverse it as 24 bits, it should be `0xF00`. So using the 64-bit version doesn't solve the general problem. — vgru, Apr 20 '18 at 14:55
@Groo The question asks for 1-8 bytes. Bits are not mentioned. — , Apr 20 '18 at 14:57
@Ivan: Bits are mentioned, in rhe title and the body. The question states the size is one to eight bytes but does not say the size is quantized in bytes. 13 bits is between one and eight bytes. Additionally, the question does not state how the size will be determined. At the position of the leading bit? At the start of the byte containing the leading bit? By a separate length parameter? The question ought to be clarified. — Eric Postpischil, Apr 20 '18 at 16:07
@EricPostpischil The question mentions "bit reversal" as an operation. There are many operations like that involving bits bit people mostly apply them to 1, 2, 3, 4, 8 byte integers. So my assumption follows the common practice. How data size is passed is irrelevant to the algorithm. SO is not a code writing service. — , Apr 20 '18 at 16:11
@Ivan: There is no need to assume. The OP can clarify, and should. How the size is determined is relevant, both for correctness (the reversal of a value in which bit 12 is the leading set bit is different depending on whether 13 or 16 bits are to be reversed) and for performance (requiring the algorithm to determine the size from the value rather than a separate length parameter requires more work, and it may interact with the method used to effect the reversal). We do not even know how the value is passed—if it could be one byte or eight, is it passed in 64 bits? Or a byte buffer? — Eric Postpischil, Apr 20 '18 at 17:08
@Ivan The question arises for a development context in the automotive sector, so I can not predict the size. The frame arrives with a byte that indicates the size of the frame. the size varies between 1 byte and 8. — gerrard2461, Apr 26 '18 at 08:44
@gerrard2461 Since proper bit reverse algorithm has logarithmic complexity, applying it to 64-bit value should be the fastest way of doing this. Although if cpu cannot handle 64-bit integer natively, at least 2 variants can be used - 32-bit one and 64-bit one. — , Apr 26 '18 at 10:07
@Ivan yes it seems the fastest way..i will try it. thank you very much for your help — gerrard2461, Apr 26 '18 at 10:10

vogomatix · Answer 1 · 2018-04-20T15:18:54.660

Best is an arbitrary term, as you don't specify whether the amount of code or performance is your criteria.

For a 'fast' result you should just use a look up table (LUT) to reverse the bits in every byte, and then shift the bytes into the result.

// there are easy ways to generate this table instead of doing it yourself
unsigned char Reverse[256] = {
  0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0, 
  ...
  0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF
};

int last = 3-1; // last element 
unsigned char in[8] = { 0x12, 0x34, 0x56 };
unsigned char out[8];

for (int index = 0; index <= last; index++) {
    // store bytes in reverse, reverse bits in each byte using LUT
    out[ last-index] = Reverse[ in[ index] ];
}

There are a number of alternative methods to perform bit reversal on this Bit Twiddling Hacks page, if the above isn't quite satisfactory

It would be safer to initialize the LUT with a set of macros to avoid spurious bugs. — chqrlie, Apr 20 '18 at 16:24
That's why I put the comment about `generating the table`. Also the link I included in the answer has suitable macros. — vogomatix, Apr 25 '18 at 06:34

score 2 · Accepted Answer · 2018-04-20T22:06:36.710

Operations like this can be performed by incrementally moving many bits at a time using bit operations.

Example for a 32-bit integer:

uint32_t reverse(uint32_t x, size_t len)
{
    assert(len > 0 && len <= 4);
    x = ((x & 0x55555555) <<  1) | ((x & 0xAAAAAAAA) >>  1);
    x = ((x & 0x33333333) <<  2) | ((x & 0xCCCCCCCC) >>  2);
    x = ((x & 0x0F0F0F0F) <<  4) | ((x & 0xF0F0F0F0) >>  4);
    x = ((x & 0x00FF00FF) <<  8) | ((x & 0xFF00FF00) >>  8);
    x = ((x & 0x0000FFFF) << 16) | ((x & 0xFFFF0000) >> 16);
    return x >> (32 - len * 8);
}

This should probably be the fastest and cache friendly implementation.

To implement similar for a custom width integer you need to be able to construct bit mask that has "X bit over X bit set" e.g. 1 bit after 1 (every second bit), 2 after 2 etc. The rest should be self-explanatory.

You can implement other variant based on this. If you need to choose them at runtime, it might be better to just use the widest version (64-bit) all the time to avoid branching.

Note: GCC is able to recognize byte-swap operation in the last two lines before the return statement and is able to generate bswap on x86 and rev on ARM. With 64-bit CPU this make 64-bit version equivalent to the version above in speed.

Apparently you cannot post an algorithm based on common techniques without referencing some paper/book/patent this days. So I am posting a highly patented, copyrighted and in every other way protected generalized version here. Note that you should think twice before using it (because of how patented it is).

Unfortunately, assembly generated out of this code is a complete crap. Only clang is able to somehow understand what is going on and optimize it away.

The original implementation I wrote few years ago was in C++ and heavily relied on templates to help the compiler. In the end it was a horrible and complicated mess, but could generate many bit-oriented algorithms for 8, 16, 32 and 64 integers (and even more if compiler supported them). Nevertheless, this can serve as a description of the algorithm above.

This code implements reverse algorithm for 8, 16, 32, 64 bit words using single instance of a 64 bit code. Helper functions repeat, select can be used as a basic for many bit algorithms (althougth they basically generate required bit masks).

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>

/**
 * @brief Mask with `n` bits set.
 */
static uint64_t bits(size_t n)
{
    return (((1ull << n) - 1) | (-uint64_t(n >= 64)));
}

static uint64_t do_repeat(uint64_t x, size_t w, size_t n)
{
    if (n == 0)
        return x;

    const size_t shift = w * (n - 1);
    return (x << shift) | do_repeat(x, w, n - 1);
}

/**
 * @brief Repeat pattern over 64-bit word.
 *
 * @code
 * assert(repeat(       0x1, 32) == 0x5555555555555555);
 * assert(repeat(       0x3, 16) == 0x3333333333333333);
 * assert(repeat(       0xF,  8) == 0x0F0F0F0F0F0F0F0F);
 * assert(repeat(      0xFF,  4) == 0x00FF00FF00FF00FF);
 * assert(repeat(    0xFFFF,  2) == 0x0000FFFF0000FFFF);
 * assert(repeat(0xFFFFFFFF,  1) == 0x00000000FFFFFFFF);

 * assert(repeat(       0x1, 16) == 0x1111111111111111);
 * assert(repeat(      0x12,  8) == 0x1212121212121212);
 * assert(repeat(    0x1234,  4) == 0x1234123412341234);
 * assert(repeat(0x12345678,  2) == 0x1234567812345678);
 * @endcode
 */
static uint64_t repeat(uint64_t x, size_t n)
{
    assert(n != 0);
    return do_repeat(x, 64 / n, n);
}

/**
 * @brief Selects `1 << n` bits over `1 << n` bits.
 *
 * @code
 * assert(select(0) == 0x5555555555555555);
 * assert(select(1) == 0x3333333333333333);
 * assert(select(2) == 0x0F0F0F0F0F0F0F0F);
 * assert(select(3) == 0x00FF00FF00FF00FF);
 * assert(select(4) == 0x0000FFFF0000FFFF);
 * assert(select(5) == 0x00000000FFFFFFFF);
 * @endcode
 */
static uint64_t select(size_t n)
{
    assert(n < 6);
    return repeat(bits(1 << n), 1 << (5 - n));
}

static uint64_t do_reverse(uint64_t x, size_t n)
{
    const size_t shift = (1ull << n);
    const uint64_t lo = select(n) << 0;
    const uint64_t hi = select(n) << shift;

    x = ((x & lo) << shift) | ((x & hi) >> shift);

    if (n == 0)
        return x;

    return do_reverse(x, n - 1);
}

uint64_t reverse64(uint64_t x)
{
    return do_reverse(x, 5);
}

uint32_t reverse32(uint32_t x)
{
    return do_reverse(x, 4);
}

uint16_t reverse16(uint32_t x)
{
    return do_reverse(x, 3);
}

uint8_t reverse8(uint8_t x)
{
    return do_reverse(x, 2);
}

int main()
{
    assert(repeat(       0x1, 32) == 0x5555555555555555);
    assert(repeat(       0x3, 16) == 0x3333333333333333);
    assert(repeat(       0xF,  8) == 0x0F0F0F0F0F0F0F0F);
    assert(repeat(      0xFF,  4) == 0x00FF00FF00FF00FF);
    assert(repeat(    0xFFFF,  2) == 0x0000FFFF0000FFFF);
    assert(repeat(0xFFFFFFFF,  1) == 0x00000000FFFFFFFF);

    assert(repeat(       0x1, 16) == 0x1111111111111111);
    assert(repeat(      0x12,  8) == 0x1212121212121212);
    assert(repeat(    0x1234,  4) == 0x1234123412341234);
    assert(repeat(0x12345678,  2) == 0x1234567812345678);

    assert(select(0) == 0x5555555555555555);
    assert(select(1) == 0x3333333333333333);
    assert(select(2) == 0x0F0F0F0F0F0F0F0F);
    assert(select(3) == 0x00FF00FF00FF00FF);
    assert(select(4) == 0x0000FFFF0000FFFF);
    assert(select(5) == 0x00000000FFFFFFFF);

    assert(reverse8 (              0xA5) == 0xA5);
    assert(reverse16(            0xFEA5) == 0xA57F);
    assert(reverse32(        0xFE0000A5) == 0xA500007F);
    assert(reverse64(0xFE00FE0000A500A5) == 0xA500A500007F007F);

    return 0;
}

You might want to state your source for this algorithm. Otherwise you risk getting accused for plagiarism. — Lundin, Apr 20 '18 at 14:44
If you say so. I just noticed that the nearly identical algorithm can be found here: https://stackoverflow.com/questions/746171/most-efficient-algorithm-for-bit-reversal-from-msb-lsb-to-lsb-msb-in-c — Lundin, Apr 20 '18 at 14:59
@Lundin Identical algorithm can be found in many places on the web. It does not originate from stackoverflow. — , Apr 20 '18 at 14:59
@PeterJ_01 Bits were never mentioned. Input data is 1-8 bytes. — , Apr 20 '18 at 15:00
@Ivan But from you? Where did you first publish it? Some book or paper? — Lundin, Apr 20 '18 at 15:00
@Lundin I don't remember. I have saw many bit hacks. I can write bit reversal, bit search, population count and others without copying them from anwhere. You don't need to be a genoius to write something like this. I think IQ of 110 is more than enough for task like this. — , Apr 20 '18 at 15:01
+1 The code before the last edit was rather general, but the updated version solves OP's problem (up to 4 bytes at least). — vgru, Apr 20 '18 at 15:44
The earliest reference I can find is: Freed, Edwin E. 1983. "Binary Magic Numbers," Dr. Dobb's Journal Vol. 78 (April), pp. 24-37. — Ian Abbott, Apr 20 '18 at 16:41

score 1 · Answer 3 · edited Apr 20 '18 at 15:05

It can be like below, check first and last bit position status, if both bit status(0 or 1) are different, toggle both.

unsigned int data = 0xa0;
for(int start = 0, end = 8*sizeof(data) - 1; start < end; start++, end--) {
        if(data>>start&1 != data>>end&1) { /*check last and 1st bit pos status */ 
                data = data ^ 1<<start;/*complimenting start bit */
                data = data ^ 1<<end;
        }
}

end = 8*sizeof(data) - 1 if input is 8 bit long or 32 bit long, it works for both.

Most Efficient Algorithm for Bit Reversal with undefined size of data

3 Answers3