You could use rayon, a data parallelism library that seems like a good fit for your use case. It is very simple to use: Just change buf.iter()
into buf.par_iter()
, and Rayon does the rest:
use rayon::prelude::*;
fn is_zero_par(buf: &[u8]) -> bool {
buf.par_iter().all(|&b| b == 0)
}
For a vector of 20 million elements, rayon shows a 7x increase in performance:
#![feature(test)]
use rayon::prelude::*;
extern crate test;
fn v() -> Vec<u8> {
std::iter::repeat(0).take(20000000).collect()
}
fn is_zero(buf: &[u8]) -> bool {
buf.into_iter().all(|&b| b == 0)
}
fn is_zero_par(buf: &[u8]) -> bool {
buf.par_iter().all(|&b| b == 0)
}
#[bench]
fn bench_is_zero(b: &mut test::Bencher) {
let v = test::black_box(v());
b.iter(|| is_zero(&v[..]))
}
#[bench]
fn bench_is_zero_par(b: &mut test::Bencher) {
let v = test::black_box(v());
b.iter(|| is_zero_par(&v[..]))
}
running 2 tests
test tests::bench_is_zero ... bench: 7,217,686 ns/iter (+/- 478,845)
test tests::bench_is_zero_par ... bench: 1,080,959 ns/iter (+/- 111,692)
Note that the performance impact of multi-threading depends on the workload (number of elements), and smaller workloads may get impacted negatively.