shift bits while retaining pattern

Question

My apologies if this has been asked/answered before but I'm honestly not even sure how to word this as a question properly. I have the following bit pattern:

0110110110110110110110110110110110110110110110110110110110110110

I'm trying to perform a shift that'll preserve my underlying pattern; my first instinct was to use right rotation ((x >> count) | (x << (-count & 63))) but the asymmetry in my bit pattern results in:

0011011011011011011011011011011011011011011011011011011011011011 <--- wrong

The problem is that the most significant (far left) bit ends up being 0 instead of the desired 1:

1011011011011011011011011011011011011011011011011011011011011011 <--- right

Is there a colloquial name for this function I'm looking for? If not, how could I go about implementing this idea?

Additional Information:

While the question is language agnostic I'm currently trying to solve this using C#.
The bit patterns I'm using are entirely predictable and always have the same structure; the pattern starts with a single zero followed by n - 1 ones (where n is an odd number) and then repeats infinitely.
I'd like to accomplish this without conditional operations since they'd defeat the purpose of using bitwise manipulation in the first place but maybe I have no choice...

https://stackoverflow.com/questions/35167/is-there-a-way-to-perform-a-circular-bit-shift-in-c this? — Kevin Gosse, Aug 09 '17 at 19:34
rotation is the correct term, and to me it looks like you have the "right" and "wrong" results mixed up. That is, you dropped a `0` off the right side, so I expect a `0` on the left. — Ben Voigt, Aug 09 '17 at 19:34
@BenVoigt You're right, I described what I want incorrectly; edited question to reflect as much. — Kittoes0124, Aug 09 '17 at 19:34
The requirement is a bit mysterious, what does it mean to preserve the pattern (while shifting anyway)? — harold, Aug 09 '17 at 19:38
@harold I need to preserve the positions of the zeros in relation to each other. — Kittoes0124, Aug 09 '17 at 19:42
How much is `count` in your example? Are you using `long` or `ulong`? — Olivier Jacot-Descombes, Aug 09 '17 at 19:42
@OlivierJacot-Descombes `count` is a parameter that depends on the bit pattern, in the examples I demoed the value is 1. Data type is irrelevant as I only care about bits but the example uses `ulong` in order to perform the circular shift properly (though, as discovered, circular shift is not what I need). — Kittoes0124, Aug 09 '17 at 19:45
The problem seems to be that your bit pattern repeats a pattern every 3 bits. But since a 64-bit number is not divisible by 3 a bit rotation inevitably destroys the pattern. What kind of problem are you trying to solve? [What is the XY problem?](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) — Olivier Jacot-Descombes, Aug 09 '17 at 19:58
So is it correct to choose as the new top bit (assuming we shift right by 1), a 0 iff the top n-1 bits are 1, a 1 otherwise? Of course this is doable with bitwise operation, it's easy if n is a compile time constant (is it?) — harold, Aug 09 '17 at 19:58
You say that the pattern is 0 followed by odd numbers of 1. In your example, you have the pattern 0 followed by 2 1's. Can you update the question to show a right example of the number you want to rotate? — displayName, Aug 09 '17 at 20:00
@displayName I actually said that `n` is an odd number which means that there will always be an even number of ones since `n - 1` is always even when n is odd. — Kittoes0124, Aug 09 '17 at 20:01
@OlivierJacot-Descombes You're exactly right; the misalignment with the machine's word size will always end up destroying my patterns when a circular shift is performed and that's the exact problem I'm trying to solve. These bit patterns represent wheel factorization on an infinite number line but my machine has a fixed word size, I need to realign the patterns in every `n` bits in order to preserve my maths. — Kittoes0124, Aug 09 '17 at 20:07
Do you know in advance the number `n`? Or would that have to be detected from the data? — Ben Voigt, Aug 09 '17 at 20:13
@BenVoigt I do, but need logic that can be generalized to any `n`. If it helps, I'm testing with 3, 5, and 7. — Kittoes0124, Aug 09 '17 at 20:16
Is it acceptable to change this problem from "shifting" into "extracting"? As in, from an infinite stream of ones and zeroes in the correct format, extract 64 of them at a given offset. In the implementation there would be no need to store infinite bits, and extracting a range of bits is not so bad (couple of shifts and an OR). This seems easier than doing it incrementally. — harold, Aug 09 '17 at 21:06
@harold Don't think so since the only thing I know ahead of time are the patterns that I apply to an (imaginary) infinite stream of all 1s. After the patterns are applied I'm left with indices that are potentially prime. — Kittoes0124, Aug 09 '17 at 21:10
Then couldn't you extract a range of bits (can even be 64-aligned) from every generator that you want, and AND them together? That's really just an optimization of "loop over the array and set every multiple of x to 'composite'" but with 64 bits at a time, naturally only useful for small x — harold, Aug 09 '17 at 21:19
@harold That's what I'm trying to build via these shifts I think. A 64-bit generator that spits out the 64-bit pattern for a given `n` and `offset` would solve my problem. — Kittoes0124, Aug 09 '17 at 21:30
There is your branchless answer, without conditional operators. — displayName, Aug 09 '17 at 21:50
@displayName Much obliged, I'm a bit swamped with actual duties at the moment but am very excited to experiment with your code. — Kittoes0124, Aug 09 '17 at 21:53
In [this answer](https://codereview.stackexchange.com/a/171290/129505) I solve exactly the same problem as above (with 0s and 1s reversed) using lookup tables. You have to skip down to the section "Bitmaps FTW" since the first part is looking at a different algorithm. The size of the lookup table is fairly limited because of the repetitive nature of the streams, you only need at most P byte-wise "jump ins" for a pattern of period P. — BeeOnRope, Aug 10 '17 at 21:36

Ben Voigt · Accepted Answer · 2017-08-09T21:56:48.080

You've got a number structured like this:

B16  B15  B14  B13  B12  B11  B10  B09  B08  B07  B06  B05  B04  B03  B02  B01  B00

  ?    0    1    1    0    1    1    0    1    1    0    1    1    0    1    1    0

The ? needs to appear in the MSB (B15, or B63, or whatever) after the shift. Where does it come from? Well, the closest copy is found n places to the right:

B13    0    1    1    0    1    1    0    1    1    0    1    1    0    1    1    0
 ^--------------/

If your word has width w, this is 1 << (w-n)

So you can do:

var selector = 1 << (w-n);
var rotated = (val >> 1) | ((val & selector) << (n-1));

But you may want a multiple shift. Then we need to build a wider mask:

  ?    0    1    1    0    1    1    0    1    1    0    1    1    0    1    1    0
            *    *    *    *    *

Here I've chosen to pretend n = 6, it just needs to be a multiple of the basic n, and larger than shift. Now:

var selector = ((1UL << shift) - 1) << (w - n);
var rotated = (val >> shift) | ((val & selector) << (n - shift));

Working demonstration using your pattern: http://rextester.com/UWYSW47054

It's easy to see that the output has period 3, as required:

  1:B6DB6DB6DB6DB6DB
  2:DB6DB6DB6DB6DB6D
  3:6DB6DB6DB6DB6DB6
  4:B6DB6DB6DB6DB6DB
  5:DB6DB6DB6DB6DB6D
  6:6DB6DB6DB6DB6DB6
  7:B6DB6DB6DB6DB6DB
  8:DB6DB6DB6DB6DB6D
  9:6DB6DB6DB6DB6DB6
 10:B6DB6DB6DB6DB6DB
 11:DB6DB6DB6DB6DB6D

Absolutely love the lucid explanation as to why the method works and the constant refinement to make things even clearer. — Kittoes0124, Aug 09 '17 at 22:01

displayName · Answer 2 · 2017-08-09T21:49:04.080

1

Branchless Answer after a poke by @BenVoigt:

Get the last bit b by doing (n & 1);
Return n >> 1 | b << ((sizeof(n) - 1).

Original Answer:

Get the last bit b by doing (n & 1);
If b is 1, right shift the number by 1 bit and bitwise-OR it with 1 << (sizeof(n) - 1);
If b is 0, just right shift the number by 1 bit.

edited Aug 09 '17 at 21:49

answered Aug 09 '17 at 19:47

displayName

13,888
8
60
75

close, but you have a branch for no reason – Ben Voigt Aug 09 '17 at 21:40
@BenVoigt: Thank You. – displayName Aug 09 '17 at 21:49

Olivier Jacot-Descombes · Answer 3 · 2017-08-09T21:40:47.440

1

Instead of storing a lot of repetitions of a pattern, just store one recurrence and apply modulo operations on the indexes

byte[] pattern = new byte[] { 0, 1, 1 };

// Get a "bit" at index "i", shifted right by "shift"
byte bit = pattern[(i - shift + 1000000 * byte.Length) % byte.Length];

The + 1000000 * byte.Length must be greater than the greatest expected shift and ensures that we get a posistive sum.

This allows you to store patterns of virtually any length.

An optimization would be to store a mirrored version of the pattern. You could then shift left instead of right. This would simplify the index calculation

byte bit = pattern[(i + shift) % byte.Length];

edited Aug 09 '17 at 21:40

answered Aug 09 '17 at 21:34

Olivier Jacot-Descombes

104,806
13
138
188

Modulo is *way* more expensive than any bitwise operation, and you seem to only generate one bit at a time, which means a bunch of these operations to regenerate the whole integer. – Ben Voigt Aug 09 '17 at 21:42
+1. Slow as all hell so I won't be able to use it for my purposes but I do love the simplicity and it got me thinking. – Kittoes0124 Aug 09 '17 at 21:54

harold · Answer 4 · 2017-08-09T22:17:32.267

The problem was changed a bit through the comments.

For all reasonable n, the following problem can be solved efficiently after minimal pre-computation:

Given an offset k, get 64 bits starting at that position in the stream of bits that follows the pattern of (zero, n-1 ones) repeating.

Clearly the pattern repeats with a period of n, so only n different ulongs have to be produced for every given value of n. That could either be done explicitly, constructing all of them in pre-processing (they could be constructed in any obvious way, it doesn't really matter since that only happens once), or left more implicitly by storing only two ulongs per value for n (this works under the assumption that n < 64, see below) and then extracting a range from them with some shifting/ORing. Either way, use offset % n to compute which pattern to retrieve (since the offset is increasing in a predictable manner, no actual modulo operation is required^[1]).

Even with the first method, memory consumption will be reasonable since this optimization is only an optimization for low n: in particular for n > 64 there will be fewer than 1 zero per word on average, so the "old fashioned way" of visiting every multiple of n and resetting that bit starts to skip work while the above trick would still visit every word and would not be able anymore to reset multiple bits at once.

[1]: if there are multiple n's in play at the same time, a possible strategy is keeping an array offsets where offsets[n] = offset % n, which could be updated according to: (not tested)

int next = offsets[n] + _64modn[n]; // 64 % n precomputed
offsets[n] = next - (((n - next - 1) >> 31) & n);

The idea being that n is subtracted whenever next >= n. Only one subtraction is needed since the offset and thing added to the offset are already reduced modulo n.

This offset-increment can be done with System.Numerics.Vectors, which is very feature-poor compared to actual hardware but is just about able to do this. It can't do the shift (yes, it's weird) but it can implement a comparison in a branchless way.

Doing one pass per value of n is easier, but touches lots of memory in a cache unfriendly manner. Doing lots of different n at the same time may not be great either. I guess you'd just have to bechmark that..

Also you could consider hard-coding it for some low numbers, something like offset % 3 is fairly efficient (unlike offset % variable). This does take manual loop-unrolling which is a bit annoying, but it's actually simpler, just big in terms of lines of code.

I try pretty much all these optimizations and more in [this answer](https://codereview.stackexchange.com/a/171290/129505) which seems to be solving almost exactly the same problem. The OP doesn't make it clear if they'll be processing several streams with different periods at the same time, but I'm guessing yes, so the "wrap around" becomes important to do branch-free since otherwise it is unpredictable. — BeeOnRope, Aug 10 '17 at 21:38
I'm not totally sure about C# compilers, but at least with good C and C++ compilers you don't need the explicit `>> 31` version to avoid the branch for modulo counting, but can use a plain conditional, like `next = next > n ? next - n : next` which ends producing branch-free code using `cmov` mostly (`gcc` seems to like `sbb` tricks on older platforms). — BeeOnRope, Aug 10 '17 at 21:57
@BeeOnRope I don't know what RyuJIT is up to these days, but the older JIT used to be pretty bad at that sort of thing, I should probably actually check it.. — harold, Aug 10 '17 at 21:59
@BeeOnRope nowhere that I know of.. and the codegen for things like that is still branchy unfortunately. Maybe someday. — harold, Aug 10 '17 at 22:17

shift bits while retaining pattern

4 Answers4