Map set of bitmasks to number of occurrences

Question

Let's say I have n bitmasks (n is arbitrary, probably a number from 9-25). The length of these bitmasks (which is equal for all of them) does not matter, but for the sake of an example, let's say we have 3 bitmasks of length 3:

b1: 1 0 0
b2: 1 1 1
b3: 1 0 1

I now want to map those 3 bitmasks (or n in a general case) to 4 (n + 1) other bitmasks, one for each possible number of occurrences. For this example, the resulting bitmasks should be:

0 has no bits set because no bit in the original bitmasks is set 0 times.
1 has the 2nd bit set because it is set exactly one time in the original set of bitmasks
2 has the 3rd bit set because it is set exactly 2 times in the original set of bitmasks
3 has the 1st bit set because it is set exactly 3 times in the original set of bitmasks

Now the question: Does anybody know a clever strategy to achieve this? The easiest attempt to do this is obviously going through each bit and counting the number of occurrences in the original bitmask set. However, I was wondering if there was some clever way of bit manipulation to keep track of all bits at the same time. Creating intermediate bitmasks is fine. The number of operations should of course be below the "naive" way outlined above. Any ideas?

You left a space between each element. Is each one a `char` array or bits of a *single mask*? Their length may well be significant. — Weather Vane, Apr 26 '22 at 13:02
@JohnDoe This is not what the OP wants: it wants to count how many ints has the bit N set for any N, not how many bits are set in each int. — Chayim Friedman, Apr 26 '22 at 13:04
Is there a problem with the obvious solution? Does profiling shows a bottleneck? If not, I would go with it. — Chayim Friedman, Apr 26 '22 at 13:05
@WeatherVane No, the spaces are just for better readability. These are single bitmasks and I displayed the bits there. — Lucas Manzke, Apr 26 '22 at 13:08
So if you can fit the bits into a single integer, the length would matter. What have you tried? One way could be to `AND` the entire masks and apply a popcount and/or other intrinsics. — Weather Vane, Apr 26 '22 at 13:09
@ChayimFriedman The "premature optimization" argument is familiar to me. Of course, that is what I could do, but my intention is also to learn something new. It might well be though that there is no shortcut. — Lucas Manzke, Apr 26 '22 at 13:12
I appreciate the intent to learn, but I highly suspect there is nothing we can do. — Chayim Friedman, Apr 26 '22 at 13:14
@WeatherVane AND and OR for all bitmasks give me one and a half of the outlined bitmasks: The at least 1 (OR) and the 3 (or all) (AND). The ones in between seem more tricky. — Lucas Manzke, Apr 26 '22 at 13:14
Turns out I was wrong, there is a cool way to do that, although I need to polish it a bit (and bench to be sure it is actually faster). — Chayim Friedman, Apr 26 '22 at 13:22
@LucasManzke: I didn't vote to close (in fact, I cast a vote to reopen, since the question is perfectly clear to me), and the link I provided outlines several alternative strategies, if you care to take a look. — Robert Harvey, Apr 26 '22 at 13:39
@RobertHarvey I had a look at the link before writing the comment and while it is indeed interesting, it does not point at a solution to the question. Counting the number of set bits is not what I was asking for. I also did not imply that you voted to close this, just that posting the link might have been hasty. Sorry If I came off as rude, but I think closing such a question a mere 10 minutes after it was asked is not very welcoming, to say the least. — Lucas Manzke, Apr 26 '22 at 13:50
Then I apparently misunderstood your question. What I thought you asked is "is there a way to count bits other than the naïve way?" The link I posted (which I see has now been forcibly deleted by someone else) not only points out the naïve way, but several alternatives, all of which require fewer operations. I never claimed to hand you a complete solution; I assumed that *you're a software developer, and could figure out the rest yourself.* All in all, I think you're right; I found the experience very unwelcoming as well. — Robert Harvey, Apr 26 '22 at 16:51
Does this answer your question? [How to create a byte out of 8 bool values (and vice versa)?](https://stackoverflow.com/questions/8461126/how-to-create-a-byte-out-of-8-bool-values-and-vice-versa) — Chayim Friedman, Apr 26 '22 at 22:16

Jmb · Accepted Answer · 2022-04-27T07:39:04.180

Depending on how many masks you have and how many bits there are in each mask there might be a way to do this with bit twiddling operations. For up to three 6-bits masks, something like this should do it:

let mut counts = 0;
for mask in &masks {
    counts += (mask * 0x0101) & 0xAA55;
}

let mut result = [0; 4];
for i in 0..3 {
    result[(counts >> (2 * i)) & 0x3] |= 1 << (2 * i);
    result[(counts >> (2 * i + 9)) & 0x3] |= 1 << (2 * i + 1);
}

Playground

Edit:

Here's a better solution that works for up to 255 eight bits masks:

let mut counts = 0;
for &mask in &masks {
    counts += ((mask as u64).wrapping_mul (0x8040201008040201) >> 7)
            & 0x0101010101010101;
}
    
let mut result = vec![0; masks.len() + 1];
for i in 0..8 {
    result[(counts >> (8*i)) as usize & 0xFF] |= 1 << (7-i);
}

Playground

Criterion benchmarks show that this solution goes from slightly faster than a naive implementation for a single input mask, to three times as fast for 256 inputs:

Nb inputs	naive	simd	ratio
1	41 ns	39 ns	1.1
16	86 ns	43 ns	2.0
256	737 ns	232 ns	3.2

Here's the benchmark code for reference:

use criterion::{ black_box, criterion_group, criterion_main, Criterion };

fn naive (masks: impl Iterator<Item=u8>) -> Vec<u8> {
   let mut n = 1;
   let mut counts = vec![0; 8];
   for mask in masks {
      for i in 0..8 {
         if (mask >> i) & 0x1 != 0 {
            counts[i] += 1;
         }
      }
      n +=1;
   }

   let mut result = vec![0; n];
   for i in 0..8 {
      result[counts[i]] |= 1 << i;
   }

   result
}

fn simd (masks: impl Iterator<Item=u8>) -> Vec<u8> {
   let mut n = 1;
   let mut counts = 0;
   for mask in masks {
      counts += ((mask as u64).wrapping_mul (0x8040201008040201) >> 7)
         & 0x0101010101010101;
      n += 1;
   }

   let mut result = vec![0; n];
   for i in 0..8 {
      result[(counts >> (8*i)) as usize & 0xFF] |= 1 << (7-i);
   }

   result
}

pub fn bench_naive (c: &mut Criterion) {
   c.bench_function ("naive - 1", |b| b.iter (|| {
      black_box (naive (black_box (0..1))); }));
   c.bench_function ("naive - 16", |b| b.iter (|| {
      black_box (naive (black_box (0..16))); }));
   c.bench_function ("naive - 256", |b| b.iter (|| {
      black_box (naive (black_box (0..=255))); }));
}

pub fn bench_simd (c: &mut Criterion) {
   c.bench_function ("simd - 1", |b| b.iter (|| {
      black_box (simd (black_box (0..1))); }));
   c.bench_function ("simd - 16", |b| b.iter (|| {
      black_box (simd (black_box (0..16))); }));
   c.bench_function ("simd - 256", |b| b.iter (|| {
      black_box (simd (black_box (0..=255))); }));
}

criterion_group!(benches, bench_naive, bench_simd);
criterion_main!(benches);

This is a very extensive answer! How did you come with it? Love the detail, playground and benchmark section. Thanks! — Lucas Manzke, Apr 28 '22 at 17:12
@LucasManzke I came to this solution while browsing through the link given by @RobertHarvey, and specifically [this entry](https://graphics.stanford.edu/~seander/bithacks.html#ReverseByteWith64Bits) which gave me the idea to use a multiplication to spread the bits into a `u64` so that I could create several parallel counters. From there it's just a question of tweaking to get the best result. — Jmb, Apr 29 '22 at 06:34

Map set of bitmasks to number of occurrences

1 Answers1

Edit: