Reverse the order of bits in a bit array

Question

I have a long sequence of bits stored in an array of unsigned long integers, like this

struct bit_array
{
    int size; /* nr of bits */
    unsigned long *array; /* the container that stores bits */
}

I am trying to design an algorithm to reverse the order of bits in *array. Problems:

size can be anything, i.e. not necessarily a multiple of 8 or 32 etc, so the first bit in the input array can end up at any position within the unsigned long in the output array;
the algorithm should be platform-independent, i.e. work for any sizeof(unsigned long).

Code, pseudocode, algo description etc. -- anything better than bruteforce ("bit by bit") approach is welcome.

"[T]he _first_ bit in the input array can end up at any position within the unsigned long in the output array"? I'm not sure I understand. Wouldn't the first bit be at the first position in the first long? Don't you mean the _last_ bit? — Thomas Padron-McCarthy, Nov 25 '15 at 07:12
If the number of bits isn't a multiple of sizeof(long), what are the extra bits in the last long used for? Are they garbage (and can therefore be inverted), or do they have to be preserved as is? — Thomas Padron-McCarthy, Nov 25 '15 at 07:15
I think the issue is that if there are 57 bits in the bit array, the bit number 0 needs to swap with bit number 56. However, before we can do anything, we need to know whether bit 0 of the array is stored in the MSB or LSB of element 0 of the array (or, if element 0 isn't where bit 0 is, we need to understand where bit 0 is stored). — Jonathan Leffler, Nov 25 '15 at 07:17
@Thomas Padron-McCarthy, I'm sorry for bad description. Let's take an example: a long is 32 bits and we need to store, say, 49 bits. That means we need an array of 2 longs (the last 15 bits of the second long are wasted). After we reverse the bits, the first bit (position 0 in array[0]) will become the last (position 16 in array[0]). — Eques, Nov 25 '15 at 07:18
@Jonathan and Eques: Aha, it's about _reversing the order_! I thought it was just inverting each bit. Sorry for the misunderstanding. — Thomas Padron-McCarthy, Nov 25 '15 at 07:18
Yes, it's about reversing the order. I will edit the question to make it clear. — Eques, Nov 25 '15 at 07:19
Why not just add an extra two fields to the structure to define the direction and number of bits to skip? Then create the procedures to access it to depend on the direction? — Ed Heal, Nov 25 '15 at 07:24
@Ed Heal, because that means that all other algorithms (bitwise comparison, logical operations etc) will have to be redesigned. — Eques, Nov 25 '15 at 07:29
Do you have control over the definition of this structure? And if you do, why store bits in `unsigned long *`, not in `uint8_t *`. That would factor out the platform-dependent issues. — Dmitry Grigoryev, Nov 25 '15 at 11:26
@DmitryGrigoryev: yes but it would probably slow down other more common operations on bit slices such as CLR, SET, OR, AND, SET, XOR... as proper alignment for the `array` member could no longer be assumed. — chqrlie, Nov 26 '15 at 14:12
Well you could pick `uint32_t *` for performance, but then you'll have to deal with little vs big endianness. Performance and portability don't really go together. — Dmitry Grigoryev, Nov 26 '15 at 15:58
Because in terms of "bits per CPU operation" this is probably the fastest choice. — Eques, Nov 27 '15 at 06:40

score 8 · Answer 1 · 2015-11-25T07:18:05.790

8

My favorite solution is to fill a lookup-table that does bit-reversal on a single byte (hence 256 byte entries).

You apply the table to 1 to 4 bytes of the input operand, with a swap. If the size isn't a multiple of 8, you will need to adjust by a final right shift.

This scales well to larger integers.

Example:

11 10010011 00001010 -> 01010000 11001001 11000000 -> 01 01000011 00100111

To split the number into bytes portably, you need to use bitwise masking/shifts; mapping of a struct or array of bytes onto the integer can make it more efficient.

For brute performance, you can think of mapping up to 16 bits at a time, but this doesn't look quite reasonable.

edited Nov 25 '15 at 07:18

answered Nov 25 '15 at 07:15

1

The question seems be a bit concerned with efficiency ("anything better than bruteforce" etc), and would this lookup table really be as efficient as bit operations on longs? I don't really know one way or the other, but until I see measurements I suspect that this could be very much slower. – Thomas Padron-McCarthy Nov 25 '15 at 07:17
2

Lookup in a data structure, that at best is in the highest-level cache, compared to bit operations in a register. – Thomas Padron-McCarthy Nov 25 '15 at 07:20
1

@ThomasPadron-McCarthy: Depends on the target, of course. On an 8 bit CPU the table will be much faster. Because thes often don't have single-cycle shifters for arbitrary counts and work on 8 bit data anyway. For 16 bit it depends. For 32 bit it will likely be slower, much slower if the CPU has bit-reverse instructions. – too honest for this site Nov 25 '15 at 07:20
1

@Olaf: Yes, probably very architecture-dependent. My mind always seems to be stuck in modern Intel desktop processors. – Thomas Padron-McCarthy Nov 25 '15 at 07:21
1

@ThomasPadron-McCarthy: There are still some other "modern" processors. x86 actually is a minority. – too honest for this site Nov 25 '15 at 07:23
2

@Olaf: Not in my desktop computer... But yeah, I need to get out more. – Thomas Padron-McCarthy Nov 25 '15 at 07:24
1

@ThomasPadron-McCarthy: if you provide an alternative, we can benchmark comparatively. – Nov 25 '15 at 07:29
1

@ThomasPadron-McCarthy: If you have a recent x86 CPU chances are you have an ARM core inside. But just look at your smartphone, router, washing machine, etc. – too honest for this site Nov 25 '15 at 07:29
1

@Olaf: ARM inside Intel ?? – Nov 25 '15 at 07:32
1

@YvesDaoust: I just wrote x86; not Intel. AMD uses an ARM code for the security system, not sure if Intel also uses ARM cores, I had to research on that, but it is well possible, as they have the licenses. – too honest for this site Nov 25 '15 at 07:36
1

Regardless of benchmarks it's clear to say that bitwise operations will be faster than lookups. A shift operation is 1 or 2 ops whereas lookup might be > 30. – erip Nov 25 '15 at 14:46
1

@erip: could be even more if the data is input from the keyboard. – Nov 25 '15 at 16:16
2

@erip: 30 cycles sounds very pessimistic for latency to L1 cache. And even then the latency might be irrelevant; an optimized loop will likely be limited primarily by bandwidth. (or maybe even by compute) – Nov 25 '15 at 16:47

AndreyS Scherbakov · Answer 2 · 2015-11-26T08:45:53.767

I like the idea of lookup table. Still it's also a typical task for log(n) group bit tricks that may be very fast. Like:

unsigned long reverseOne(unsigned long x) {
  x = ((x & 0xFFFFFFFF00000000) >> 32) | ((x & 0x00000000FFFFFFFF) << 32);
  x = ((x & 0xFFFF0000FFFF0000) >> 16) | ((x & 0x0000FFFF0000FFFF) << 16);
  x = ((x & 0xFF00FF00FF00FF00) >> 8)  | ((x & 0x00FF00FF00FF00FF) << 8);
  x = ((x & 0xF0F0F0F0F0F0F0F0) >> 4)  | ((x & 0x0F0F0F0F0F0F0F0F) << 4);
  x = ((x & 0xCCCCCCCCCCCCCCCC) >> 2)  | ((x & 0x3333333333333333) << 2);
  x = ((x & 0xAAAAAAAAAAAAAAAA) >> 1)  | ((x & 0x5555555555555555) << 1);
  return x;
}

The underlying idea is that when we aim to reverse the order of some sequence we may swap the head and tail halves of this sequence and then separately reverse each of halves (which is done here by applying the same procedure recursively to each half).

Here is a more portable version supporting unsigned long widths of 4,8,16 or 32 bytes.

#include <limits.h>

#define ones32 0xFFFFFFFFUL
#if (ULONG_MAX >> 128)
#define fill32(x) (x|(x<<32)|(x<<64)|(x<<96)|(x<<128)|(x<<160)|(x<<192)|(x<<224))
#define patt128 (ones32|(ones32<<32)|(ones32<<64) |(ones32<<96))
#define patt64  (ones32|(ones32<<32)|(ones32<<128)|(ones32<<160))
#define patt32  (ones32|(ones32<<64)|(ones32<<128)|(ones32<<192))
#else
#if (ULONG_MAX >> 64)
#define fill32(x) (x|(x<<32)|(x<<64)|(x<<96))
#define patt64  (ones32|(ones32<<32))
#define patt32  (ones32|(ones32<<64))
#else
#if (ULONG_MAX >> 32)
#define fill32(x) (x|(x<<32))
#define patt32  (ones32)
#else
#define fill32(x) (x)
#endif
#endif
#endif

unsigned long reverseOne(unsigned long x) {
#if (ULONG_MAX >> 32)
#if (ULONG_MAX >> 64)
#if (ULONG_MAX >> 128)
  x = ((x & ~patt128) >> 128) | ((x & patt128) << 128);
#endif
  x = ((x & ~patt64) >> 64) | ((x & patt64) << 64);
#endif
  x = ((x & ~patt32) >> 32) | ((x & patt32) << 32);
#endif
  x = ((x & fill32(0xffff0000UL)) >> 16) | ((x & fill32(0x0000ffffUL)) << 16);
  x = ((x & fill32(0xff00ff00UL)) >> 8)  | ((x & fill32(0x00ff00ffUL)) << 8);
  x = ((x & fill32(0xf0f0f0f0UL)) >> 4)  | ((x & fill32(0x0f0f0f0fUL)) << 4);
  x = ((x & fill32(0xccccccccUL)) >> 2)  | ((x & fill32(0x33333333UL)) << 2);
  x = ((x & fill32(0xaaaaaaaaUL)) >> 1)  | ((x & fill32(0x55555555UL)) << 1);
  return x;
}

This code is not portable and has great potential for subtle integer type bugs. You need to change unsigned long to `uint64_t` and also ensure that the type of all of the integer literals is `uint64_t` by using `const uint64_t` variables, instead of using magic numbers. For example, the literal `0x00000000FFFFFFFF` is a 32 bit unsigned int, not a 64 bit signed int as you might have assumed, because adding lots of zeroes in the beginning of a literal will not make it a 64 bit type. > — Lundin, Nov 25 '15 at 15:48
Now imagine if `int` is 64 bit on the given system: suddenly there will integer promotion of this literal and suddenly it is now a signed type. Why do use signed literals in the first place? You almost never want signed types when doing "bit fiddling" operations. The algorithm proposed here is pretty good, but C code like this is a ticking bomb. — Lundin, Nov 25 '15 at 15:48

score 2 · Answer 3 · answered Nov 25 '15 at 07:14

2

In a collection of related topics which can be found here, the bits of an individual array entry could be reversed as follows.

unsigned int v;     // input bits to be reversed
unsigned int r = v; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end

for (v >>= 1; v; v >>= 1)
{   
  r <<= 1;
  r |= v & 1;
  s--;
}
r <<= s; // shift when v's highest bits are zero

The reversal of the entire array could be done afterwards by rearranging the individual positions.

answered Nov 25 '15 at 07:14

Codor

17,447
9
29
56

3

The variable names "input", "output" and "size"/"count" would make more sense. Don't just copy/paste solutions from that site, it is notorious for poor code readability. – Lundin Nov 25 '15 at 07:30
1

Thanks for the comment; basically I just did copy & paste in order not to introduce errors. However, perhaps a selfmade implementation would have been better. – Codor Nov 25 '15 at 07:47
1

The final reversal is not so easy to do. – chqrlie Nov 25 '15 at 08:11

chqrlie · Answer 4 · 2015-11-25T08:18:48.757

You must define what is the order of bits in an unsigned long. You might assume that bit n is corresponds to array[x] & (1 << n) but this needs to be specified. If so, you need to handle the byte ordering (little or big endian) if you are going to use access the array as bytes instead of unsigned long.

I would definitely implement brute force first and measure whether the speed is an issue. No need to waste time trying to optimize this if it is not used a lot on large arrays. An optimized version can be tricky to implement correctly. If you end up trying anyway, the brute force version can be used to verify correctness on test values and benchmark the speed of the optimized version.

user694733 · Answer 5 · 2015-11-25T08:54:06.447

2

The fact that the size is not multiple of sizeof(long) is the hardest part of the problem. This can result in a lot of bit shifting.

But, you don't have to do that if you can introduce new struct member:

struct bit_array
{
    int size; /* nr of bits */
    int offset; /* First bit position */
    unsigned long *array; /* the container that stores bits */
}

Offset would tell you how many bits to ignore at the beginning of the array.

Then you only only have to do following steps:

Reverse array elements.
Swap bits of each element. There are many hacks for in the other answers, but your compiler might also provide intrisic functions to do it in fewer instructions (like RBIT instruction on some ARM cores).
Calculate new starting offset. This is equal to unused bits the last element had.

edited Nov 25 '15 at 08:54

answered Nov 25 '15 at 08:47

user694733

15,208
2
42
68

1

Why not use an `uint8_t*` for the array instead? – Lundin Nov 25 '15 at 15:39
1

@Lundin OP stated that size is not multiples of 8 either, so I don't think that helps. In fact, I think the best data type for array element should be chosen depending on the size of the CPU register on target architecture. – user694733 Nov 26 '15 at 08:18
1

You can still access the data by chunks of "CPU register size" even if the type is `uint8_t`. You can't do the opposite though, so on small CPUs the unsigned long will be very burdensome. – Lundin Nov 26 '15 at 08:38
1

@Lundin True, I don't think that unsigned long is an optimal data type. But with `uint8_t` on 32-bit CPU (for example) you'd have to take care of the alignment manually and treat last element specially if array size is not divisible by four, if you wanted to work on 4 byte chunks. – user694733 Nov 26 '15 at 09:02

Olaf Seibert · Answer 6 · 2015-11-25T14:44:26.690

I would split the problem into two parts.

First, I would ignore the fact that the number of used bits is not a multiple of 32. I would use one of the given methods to swap around the whole array like that.

pseudocode:

for half the longs in the array:
    take the first longword;
    take the last longword;
    swap the bits in the first longword
    swap the bits in the last longword;

    store the swapped first longword into the last location;
    store the swapped last longword into the first location;

and then fix up the fact that the first few bits (call than number n) are actually garbage bits from the end of the longs:

for all of the longs in the array:
    split the value in the leftmost n bits and the rest;
    store the leftmost n bits into the righthand part of the previous word;
    shift the rest bits to the left over n positions (making the rightmost n bits zero);
    store them back;

You could try to fold that into one pass over the whole array of course. Something like this:

for half the longs in the array:
    take the first longword;
    take the last longword;
    swap the bits in the first longword
    swap the bits in the last longword;

    split both value in the leftmost n bits and the rest;

    for the new first longword:
        store the leftmost n bits into the righthand side of the previous word;
        store the remaining bits into the first longword, shifted left;

    for the new last longword:
        remember the leftmost n bits for the next iteration;
        store the remembered leftmost n bits, combined with the remaining bits, into the last longword;

    store the swapped first longword into the last location;
    store the swapped last longword into the first location;

I'm abstracting from the edge cases here (first and last longword), and you may need to reverse the shifting direction depending on how the bits are ordered inside each longword.

Reverse the order of bits in a bit array

6 Answers6

Linked