2

How can I quickly scan groups of 128 bits that are exact equal repeating binary patterns, such 010101... Or 0011001100...?

I have a number of 128 bit blocks, and wish to see if they match the patterns where the number of 1s is equal to number of 0s, eg 010101.... Or 00110011... Or 0000111100001111... But NOT 001001001...

The problem is that patterns may not start on their boundary, so the pattern 00110011.. May begin as 0110011..., and will end 1 bit shifted also (note the 128 bits are not circular, so start doesn't join to the end)

The 010101... Case is easy, it is simply 0xAAAA... Or 0x5555.... However as the patterns get longer, the permutations get longer. Currently I use repeating shifting values such as outlined in this question Fastest way to scan for bit pattern in a stream of bits but something quicker would be nice, as I'm spending 70% of all CPU in this routine. Other posters have solutions for general cases but I am hoping the symmetric nature of my pattern might lead to something more optimal.

If it helps, I am only interested in patterns up to 63 bits long, and most interested in the power of 2 patterns (0101... 00110011... 0000111100001111... Etc) while patterns such as 5 ones/5 zeros are present, these non power 2 sequences are less than 0.1%, so can be ignored if it helps the common cases go quicker.

Other constraints for a perfect solution would be small number of assembler instructions, no wildly random memory access (ie, large rainbow tables not ideal).

Edit. More precise pattern details.

I am mostly interested in the patterns of 0011 and 0000,1111 and 0000,0000,1111,1111 and 16zeros/ones and 32 zeros/ones (commas for readabily only) where each pattern repeats continuously within the 128 bits. Patterns that are not 2,4,8,16,32 bits long for the repeating portion are not as interesting and can be ignored. ( eg 000111... )

The complexity for scanning is that the pattern may start at any position, not just on the 01 or 10 transition. So for example, all of the following would match the 4 bit repeating pattern of 00001111... (commas every 4th bit for readability) (ellipses means repeats identically)

0000,1111.... Or 0001,1110... Or 0011,1100... Or 0111,1000... Or 1111,0000... Or 1110,0001... Or 1100,0011... Or 1000,0111

Within the 128bits, the same pattern needs to repeat, two different patterns being present is not of interest. Eg this is NOT a valid pattern. 0000,1111,0011,0011... As we have changed from 4 bits repeating to 2 bits repeating.

I have already verified the number of 1s is 64, which is true for all power 2 patterns, and now need to identify how many bits make up the repeating pattern (2,4,8,16,32) and how much the pattern is shifted. Eg pattern 0000,1111 is a 4 bit pattern, shifted 0. While 0111,1000... Is a 4 bit pattern shifted 3.

Community
  • 1
  • 1
rlb
  • 1,674
  • 13
  • 18
  • 2
    It would help to have a more formal definition of what a pattern is and is not. Also it would help to know what you're looking for, simply detecting that there's or there's not a pattern or that plus pattern parameters or pattern ID. – Alexey Frunze Mar 22 '13 at 01:41
  • This has the feel of "homework assignment"? Do you have a working solution (albeit slow) that you could post that would satisfy the needs of a homework assignment so we know this isn't a homework assignment? – phonetagger Mar 22 '13 at 01:55
  • @alexey, updated with more formal explanation. – rlb Mar 22 '13 at 02:08
  • @phonetagger. No this is real world, scanning a data stream arriving at 95Mbits/second, and must make some very quick decisions, with limited resource. My current 'slow' is handling the 95Mbit, but 70% of one core, just if I can speed this I have more resource for other areas. – rlb Mar 22 '13 at 02:13
  • @rlb Should your sixth example `Or 1110,0000` be `1110,0001`? – mythagel Mar 22 '13 at 02:59
  • @mythagel. Whoops, fixed thanks – rlb Mar 22 '13 at 03:04
  • This isn't very well thought through, but perhaps something like (pseudocode): `high bit set ? val = ~val; off = ffs(val); val <<= off; val = ~val; pattern = ffs(val);` Then mask with precalculated power of 2 patterns to confirm (Where ffs == find first set which normally is an intrinsic like __builtin_ffs on gcc) – mythagel Mar 22 '13 at 05:21
  • Might be worth posting this to Code Golf. SSE4.2 string instructions could be useful here, too. – Igor Skochinsky Mar 22 '13 at 12:22

4 Answers4

2

Lets start with the case where the patterns do start on their boundary. You can check the first bit and use it to determine your state. Then start looping through your block, check the first bit, increment a count, left shift and repeat until you find that you've gotten the opposite bit. You can now use this initial length as the bitset length. Reset the count to 1 then count the next set of opposite bits. When you switch, check the length against the initial length and error out if they're not equal. Here's a quick function - it seems to work as expected for chars, and it shouldn't be too hard to expand it to deal with blocks of 32 bytes.

unsigned char myblock = 0x33;
unsigned char mask = 0x80, prod = 0x00;
int setlen = 0, count = 0, ones=0;

prod = myblock & mask;
if(prod == 0x80)
  ones = 1;

for(int i=0;i<8;i++){
  prod = myblock & mask;
  myblock = myblock << 1;
  if((prod == 0x80 && ones) || (prod == 0x00 && !ones)){
    count++;
  }else{
    if(setlen == 0) setlen = count;
    if(count != setlen){
      printf("Bad block\n");
      return -1;
    }
    count = 1;
    ones = ( ones == 1 ) ? 0 : 1;
  }
}

printf("Good block of with % repeating bits\n",setlen);
return setlen;

Now to deal with blocks where there's an offset, I'd suggest counting the number of bits until the first 'flip'. Store this number, then run the above routine until you hit the last segment which should have length unequal to the rest of the sets. Add the initial bits to the last segment's length, and then you should be able to compare it with the size of the rest of the sets correctly.

This code is pretty small, and bit shifting through a buffer shouldn't require too much work on the CPU's part. I'd be interested to see how this solution ends up performing against your current one.

ryanbwork
  • 2,123
  • 12
  • 12
  • The idea of counting the second sequence of bits is great. Solves all the repeating patterns not just power of 2, and tight I-stream. As a bonus it has good fail fast semantics. Admitingly, some of the other answers are also very good, but a derivative of this answer is what we've gone with for now. CPU load on dev systems looks much better. – rlb Mar 22 '13 at 07:45
  • You are missing a `d` after `%` in your printf fmt-string. – Jack Mar 23 '13 at 17:29
1

The Generic solution for this kind of problems is to create a good hashing function for the patterns and store each pattern in a hash map. Once you have the hash map created for the patterns then try to lookup in the table using the input stream. I don't have code yet but let me know if you are struck in code.. Please post it and I can work on it..

user1596193
  • 100
  • 3
1

The restriction of the pattern repeating it self all over the 128-stream makes the number of combinations limited and also the sequence will have properties making it easy to check:

One needs to iteratively check if high and low parts are same; if they are opposites, check if that particular length contains consecutive ones.

 8-bit repeat at offset 3:  00011111 11100000 00011111 11100000
 ==> high and low 16 bits are the same
 00011111 11100000 ==> high and low parts are inverted.
 Not same, nor inverted means rejection of pattern.

At that point one needs to check if there's a sequence of ones -- add '1' to the left side and check if it's power of two: n==(n & -n) is the textbook check for that.

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
  • Cunning. I've actually Merged part of this into the other more generic solution, as it is the fastest for common cases. Unfortunately I can only select one answer, so went with the more generic. Thanx – rlb Mar 22 '13 at 07:48
1

I've thought about making a state machine, so every next byte (out of 16) would advance its state and after some 16 state transitions you'd have the pattern identified. But that doesn't look very promising. Data structures and logic look more complex.

Instead, why not precompute all those 126 patterns (from 01 to 32 zeroes + 32 ones), sort them and perform binary search? That would give you at most 7 iterations of binary search. And you don't need to store all 16 bytes of every pattern as its halves are identical. That gives you 126*16/2=1008 bytes for the array of patterns. You also need something like 2 bytes per pattern to store the length of zero (one) runs and the shift relative to whatever pattern you consider unshifted. That's a total of 126*(16/2+2)=1260 bytes of data (should be gentle on the data cache) and very simple and tiny binary search algorithm. Basically, its just an improvement over the answer that you mentioned in the question.

You might want to try switching to linear search after 4-5 iterations of binary search. That may give a small boost to the overall algorithm.

Ultimately, the winner is determined by testing/profiling. And that's what you should do, get a few implementations and compare them on the real data in the real system.

Community
  • 1
  • 1
Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180