5

Given:

  • A bitmask a (say, std::uint64_t), which contains at least one set (1) bit.
  • A selector bitmask b which is a subset of a (i.e. a & b == b), and has at least one bit set.

I want to select spans of contiguous 1-bits in a which overlap with a bit in b:

a = 0b1111001110001100;
b = 0b0000001010001000;
//c=0b0000001110001100
//    XXXX  YYY   ZZ

The XXXX group is 0 in c because b & XXXX is false. The ZZ group is copied because b has one of the Z bits set. The YYY group is also set in c for the same reason. Notice that b can have multiple set bits in a single group in a.

So for every contiguous group of 1s in a, set all of those bits in c if b has a 1 in any of those positions. A more complex example:

std::uint64_t a = 0b1101110110101;
std::uint64_t b = 0b0001010010001;
// desired   c == 0b0001110110001
// contiguous groups   ^^^ ^^   ^  that overlap with a 1 in b

assert(a & b == b);           // b is a subset of a

std::uint64_t c = some_magic_operation(a, b);
assert(c == 0b0001110110001);

Are there any bit-logic instructions/intrinsics (MMX, SSE, AVX, BMI1/BMI2), or bit manipulation tricks which allows me to calculate c from a and b efficiently? (i.e. without loops)?


ADDITIONAL:

Using hint from Denis' answer I can only imagine loop-based algorithm:

std::uint64_t a = 0b0110111001001101;
std::uint64_t b = 0b0100101000001101;
assert(a & b == b); // subset

std::cout << std::bitset< 16 >(a) << std::endl;
std::cout << std::bitset< 16 >(b) << std::endl;
std::uint64_t x = (a + b) & ~a;
std::uint64_t c = 0;
while ((x = (a & (x >> 1)))) { // length of longest 1-series times
    c |= x;
}
std::cout << std::bitset< 16 >(c) << std::endl;
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Tomilov Anatoliy
  • 15,657
  • 10
  • 64
  • 169
  • does "a segment" include segments of length 1? – M.M May 19 '16 at 04:44
  • @M.M Yes, 1-segment is just non-zero-length sequence of 1-s. – Tomilov Anatoliy May 19 '16 at 04:46
  • 2
    It took me about 10 minutes to decode your description, so I made an edit to clarify for future readers. I assume you left out Intel's BMI1/BMI2 extensions by accident, not because [Haswell is too new](https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets#Supporting_CPUs). You did mention AVX, which is by no means baseline: not supported even on [Skylake Celeron/Pentium](http://ark.intel.com/products/90732), only i3 and higher. Thanks, Intel :( – Peter Cordes May 19 '16 at 06:43
  • Actually, none of the BMI instructions are obviously useful on their own. Possibly `pext` / `pdep` as building blocks, though. Do you absolutely need the output in the format you specified? Denis's answer loses the information about how big each group was, but does have a 1 in the result for each matching group. – Peter Cordes May 19 '16 at 06:55
  • @PeterCordes Thank you for correction. I'm very grateful. – Tomilov Anatoliy May 19 '16 at 08:01
  • @PeterCordes I need a (selected) parts of mask. What format can can be helpful for someone you can propose? – Tomilov Anatoliy May 19 '16 at 08:03
  • What do you do with the resulting mask? Is there a way you can modify the code/algorithm that uses the result, to allow a simpler calculation? I don't have anything specific in mind, other than Denis's output format (`(a+b)&~a`). If `b` only had a 1 in the low bit of a group, we could maybe do `( (a+b)&~a ) - b` so borrow would turn groups back into all-1s, stopping at the 1bit from Denis's expression. Is there any chance you can arrange for `b` to have that extra condition? – Peter Cordes May 19 '16 at 08:14
  • @PeterCordes In my real case - no. Many bits of selector may "hit" particular 1-series of source bitmask. – Tomilov Anatoliy May 19 '16 at 08:19
  • @Orient: The question [here](https://stackoverflow.com/q/56467971/2439725) is probably a duplicate of your question. Nevertheless, you might be interested in my [SIMD optimized solution](https://stackoverflow.com/a/56478996/2439725). – wim Jun 06 '19 at 17:41

1 Answers1

4

In case of uint64_t you may do this trick:

Let's set a = 0b11011101101. Having at least one 0 bit is important. The bitmask has 4 separate areas, filled with 1 bits. If you do c=a+(a&b), then each 1-filled area will overflow if at least one bit of b in this area is set. So you can check then, which area was overflown. For example, if you want 1-bits in 2-nd and 3-rd areas of a, you may do so:

    assert(c & 0b00100010000);
    //              ^^^ ^^ this segments overflows
Denis Sheremet
  • 2,453
  • 2
  • 18
  • 34