Optimise rust N primes generator

Question

Playing with rust iterators and lambdas I've implemented N prime numbers generator (original playground / improved code.).
It performs very poorly (10 times slower) compared to equivalent functional code in other language I'm more proficient with on the same box.
Some numbers:

test calc_primes ... bench: 7,754,795 ns/iter (+/- 1,887,591)

#[macro_use]
extern crate bencher;
use bencher::Bencher; 
use nth_prime::*;                                                                                                   
fn calc_primes(b: &mut Bencher) {
   b.iter(|| primes(10000)); 
} 

benchmark_group!(benches, calc_primes); 
benchmark_main!(benches);

I expect rust to outperform it, but i struggle to get rid of flaws in below code. So here's the call for help optimising this particular implementation.

I don't like those Vec allocations, some are surely not needed...
could probably pass more references here and there
other implementation uses logical OR on fvec bitarray which is super quick. Logic in below rust express same approach in the fold call, but this is using Vec<bool> which must be slow vs bitarray equivalent...
use different datatype instead of Vec<bool>, wouldn't mind allocating massive byte-array and operate with bit-shifts etc., but can't imagine putting this together...

pub fn primes(n: u64) -> Vec<usize> {                                                                                                                                             
    if n < 4 {                                                                                                                                                                    
        vec![2]                                                                                                                                                                   
    } else {                                                                                                                                                                      
        base(primes((n as f64).sqrt().ceil() as u64), n)                                                                                                                          
    }                                                                                                                                                                             
}                                                                                                                                                                                 

fn base(r: Vec<usize>, p: u64) -> Vec<usize> {                                                                                                                                    
    let fvec = (0..p).map(|_| false).collect::<Vec<bool>>();                                                                                                                      
    let z = r                                                                                                                                                                     
        .iter()                                                                                                                                                                   
        .map(|&x| (0..x).map(|y| y > 0).collect::<Vec<bool>>())                                                                                                                   
        .map(|x| {                                                                                                                                                                
            (0..p)                                                                                                                                                                
                .map(|n| !x[(n % (x.len() as u64)) as usize])                                                                                                                     
                .collect::<Vec<bool>>()                                                                                                                                           
        })                                                                                                                                                                        
        .fold(fvec, |acc, x| {                                                                                                                                                    
            acc.iter().zip(x).map(|(a, b)| *a || b).collect()                                                                                                                     
        });                                                                                                                                                                       
    let yy: Vec<usize> = z                                                                                                                                                        
        .iter()                                                                                                                                                                   
        .enumerate()                                                                                                                                                              
        .filter_map(|(i, x)| if !*x { Some(i) } else { None })                                                                                                                    
        .skip(1)                                                                                                                                                                  
        .collect();
    r.iter().chain(yy.iter()).map(|x| *x).collect()                                                                                                                               
}

Regarding prime number generation algorithms, there's outstanding answer in this thread, so it's not a question about optimising algorithm, but this particular implementation without external crates (it's a learning exercise).

EDIT:

After some hints from comments section, got it significantly optimised:

test calc_primes       ... bench:   4,776,400 ns/iter (+/- 1,427,534)
test calc_primes_sieve ... bench:     644,220 ns/iter (+/- 1,655,581)
test calc_primes_2     ... bench:     268,598 ns/iter (+/- 46,440)

Replaced 'inefficient' Vec<bool>@fold bitmap with idiomatic iterator way of : step_by
Dropped immutable types in favour for a state (is there a way to keep immutable approach ?)

fn base2(head: Vec<u64>, p: u64) -> Vec<u64> {                                                                                                                                    
    let mut fvec = (0..p).map(|_| false).collect::<Vec<bool>>();                                                                                                                  
    head.iter()                                                                                                                                                                   
        .map(|&x| {                                                                                                                                                               
            (0..p)                                                                                                                                                                
                .step_by(x as usize)                                                                                                                                              
                .for_each(|y| fvec[y as usize] = true)                                                                                                                            
        })                                                                                                                                                                        
        .for_each(drop);                                                                                                                                                          
    let tail: Vec<u64> = fvec.iter().enumerate()
        .filter_map(|(i, x)| if !*x { Some(i as u64) } else { None })                                                                     
        .skip(1)                                              
        .collect();                                     
    head.iter().chain(tail.iter()).cloned().collect()                                                                                                                             
}

EDIT 2:

For reference, here's the 'other' language implementation:

q)p:{$[x<4; enlist 2; r, 1_ where not any x#'not til each r:p ceiling sqrt x]}



q)\ts do[1000;p 10000]
474 443088  
=
474,000 ns/iter

For 64bit q v4.0:

q)\t do[1000;p 10000]
273  
=
273,000 ns/iter

Not to bad for interpreted language.

EDIT 3

Added more benchmarks for new implementations from answers.
(as appeared in chronological order)
Was able to shave off tad bit with no skip(1) and diff fvec allocation New playground code.

test ssqrt_p_1__vec_bool_n_mod   ... bench:   6,838,150 ns/iter (+/- 627,389)
test sieve_p_2__mut_step_by      ... bench:     367,229 ns/iter (+/- 38,279)
test ssqrt_p_3__mut_step_by      ... bench:     266,189 ns/iter (+/- 56,215)
test ssqrt_p_4__push_step_by_mod ... bench:   1,255,206 ns/iter (+/- 262,107)
test sieve_p_5__for_loop_mut     ... bench:     441,397 ns/iter (+/- 47,077)
test ssqrt_p_6__no_skip          ... bench:     176,186 ns/iter (+/- 24,765)

StdDev plays big role now... running with:

taskset -c 0 cargo bench --jobs 1

fn base3(head: Vec<u64>, p: u64) -> Vec<u64> {                                                                                                                                    
    let mut fvec = vec![false; p as usize]; fvec[1] = true;                                                                                                                       
    head.iter()                                                                                                                                                                   
        .map(|&x| {                                                                                                                                                               
            (0..p)                                                                                                                                                                
                .step_by(x as usize)                                                                                                                                              
                .for_each(|y| fvec[y as usize] = true)                                                                                                                            
        })                                                                                                                                                                        
        .for_each(drop);                                                                                                                                                          
    let tail: Vec<u64> = fvec                                                                                                                                                     
        .iter()                                                                                                                                                                   
        .enumerate()                                                                                                                                                              
        .filter_map(|(i, x)| if !*x { Some(i as u64) } else { None })                                                                                                             
        .collect();                                                                                                                                                               
    head.iter().chain(tail.iter()).cloned().collect()                                                                                                                             
}

EDIT 5

Surprisingly below no_chain is slower than base3:no_skip version. Guess that's compiler magic (tail call optimisation in recursive functions ???).

fn base4(head: Vec<u64>, p: u64) -> Vec<u64> {                                                                                                                                    
    let mut fvec = vec![false; p as usize]; fvec[1] = true;                                                                                                                       
    head.iter()                                                                                                                                                                   
        .map(|&x| {                                                                                                                                                                           
            (0..p)                                                                                                                                                                
                .step_by(x as usize)                                                                                                                                              
                .for_each(|y| fvec[y as usize] = true);                                                                                                                           
            fvec[x as usize] = false;                                                                                                                                             
        })                                                                                                                                                                        
        .for_each(drop);                                                                                                                                                          
    fvec.iter().enumerate()                                                                                                                                                       
        .filter_map(|(i, x)| if !*x { Some(i as u64) } else { None })                                                                                                             
        .collect()                                                                                                                                                                
}

Can you explain your final remark? Your algorithm seems quite unfamiliar to me (I would try the "Sieve of Erathosthenes", yet this algorithm does not easily match). Certainly one could optimize allocation and copying, but turning the algorithm to an iterative one would make it easier to understand and probably more efficient. — CoronA, May 08 '20 at 20:31
_"without external crates"_ -- you won't get far in Rust without external crates. The standard library is deliberately small. For example, you probably want to use [`bitvec`](https://docs.rs/bitvec/0.17.4/bitvec/index.html). — Peter Hall, May 08 '20 at 20:35
I tried out my own solution on [exercism](https://exercism.io/tracks/rust/exercises/sieve/solutions/d159eab27579495fb7e0c920453f548e) (no bitvec) and it is 30 times faster than this solution. Naively using `BitVec` instead of `Vec` results in a slower performance. — CoronA, May 08 '20 at 21:34
Thanks guys for hints. @CoronA - `step_by` was the missing link! `sqrt()` does a shortcut in this calc, hence above algorithm is quicker. I was looking for something without the `mut` state, but if that's the best way, i'm ok with it. Next I'll see what can I do with `bitvec` @[Peter Hall]. — bua, May 09 '20 at 00:03
I can reproduce your measures. I tried to optimize it further by removing the `.collects()` (chain the iterators directly instead of the collected vectors), but it got slower. — CoronA, May 09 '20 at 05:46

Veedrac · Answer 1 · 2020-05-09T12:04:13.560

4

Your code has some very strange overheads, like

(0..x).map(|y| y > 0).collect::<Vec<bool>>()

which you only use like x[(n % (x.len() as u64)) as usize]... aka. as a really slow way of writing n % x == 0.

Here's an almost-direct variant of your code where I simply remove these overheads you've added, and inline code where it makes sense.

pub fn primes(n: usize) -> Vec<usize> {
    if n < 4 {
        vec![2]
    } else {
        let mut r = primes((n as f64).sqrt().ceil() as usize);
        let mut fvec: Vec<_> = (0..n).map(|i| r.iter().any(|x| i % x == 0)).collect();
        fvec[1] = true;

        for (i, x) in fvec.into_iter().enumerate() {
            if !x { r.push(i) }
        }
        r
    }
}

The largest actual change was replacing your skip(1) with fvec[1] = true, since it was significantly simpler that way.

Of course, the actual sieve of Eratosthenes is simpler, lower overhead, and has a better time complexity.

pub fn primes(n: usize) -> Vec<usize> {
    let mut primes = Vec::new();
    let mut sieve = vec![true; n];

    for p in 2..n {
        if sieve[p] {
            primes.push(p);
            let mut i = p * p;
            while i < n {
                sieve[i] = false;
                i += p;
            }
        }
    }

    primes
}

Here's a lightly optimized sieve, where unnecessary work and memory use is avoided.

pub fn primes6(mut n: usize) -> Vec<usize> {
    let mut primes = vec![2];
    n >>= 1;

    // 0  1  2  3  4
    // 2, 3, 5, 7, 9, ...
    let mut sieve = vec![true; n];

    let end = (n as f64).sqrt().ceil() as usize;
    for p in 1..end {
        if sieve[p] {
            let prime = p * 2 + 1;
            primes.push(prime);

            let mut i = ((prime * prime) - 1) / 2;
            while i < n {
                sieve[i] = false;
                i += prime; // skip by 2·prime
            }
        }
    }

    for p in end..n {
        if sieve[p] {
            let prime = p * 2 + 1;
            primes.push(prime);
        }
    }

    primes
}

edited May 09 '20 at 12:04

answered May 09 '20 at 04:44

Veedrac

58,273
15
112
169

Actually `sqrt` algorithm is lower overhead. I was able to optimise it with your suggestion to remove `skip(1)` and different `fvec` alloc. I was assuming it's something compiler will take care of, well looks like we still need programmers ;). Thanks. – bua May 09 '20 at 08:38
@bua Shouldn't be. Turns out my `for` loop over `step_by` was just slow; I knew it would be but didn't think it would matter as much as it did. Fixing it makes the sieve faster than any of your code, as things should be. – Veedrac May 09 '20 at 11:29
@bua Plus that sieve code is *far* from optimized; I can easily double its performance again by making `p` iterate over only odd numbers, for example. – Veedrac May 09 '20 at 11:35
I added the benchmark to your code in my posting. As a challenge, I tried out to double the performance by modifying your code to iterate only over odd numbers - and performance declined. Naively unrolling the loop and `.step_by(2)` seems not to be an optimization. – CoronA May 09 '20 at 11:56
@CoronA As mentioned, `step_by` is fairly slow, especially when using `for` loops (`for_each` is faster, but still not ideal). I've added an update with a mildly-optimized sieve. – Veedrac May 09 '20 at 12:03
@Veedrac just a note, as i mentioned in the question, it's not about optimizing algorithm, but implementation (to skill up my rust). Appreciate all insights though! – bua May 09 '20 at 12:03
2

@bua If it's primarily about learning Rust, I strongly suggest stopping writing code like `head.iter().map(|&x| ...).for_each(drop)`. A `for &x in head` won't bite. – Veedrac May 09 '20 at 12:19
@Veedrac, Point taken, but as first few words in my post, atm I'm experimenting with: 'Playing with rust iterators and lambdas ....' – bua May 09 '20 at 12:34

CoronA · Answer 2 · 2020-05-09T12:53:34.383

I found two more promising optimizations:

The classical algorithm:

pub fn primes_erathosthenes(upper_bound: u64) -> Vec<u64> {
  let sieve_size = (upper_bound + 1) as u64;
  let mut sieve = vec![false; sieve_size as usize];
  let mut p = 2;
  'next: loop {
    if p * p >= sieve_size {
      break 'next;
    }
    for f in p..=(upper_bound / p) {
      let index = p * f;
      sieve[index as usize] = true;
    }
    p += 1;
    while sieve[p as usize] {
      p += 1;
    }
  }
  sieve
    .into_iter()
    .enumerate()
    .skip(2)
    .filter_map(|(i, p)| if p { None } else { Some(i as u64) })
    .collect()
}

Your final algorithm with improved loop

fn base_improved(head: Vec<u64>, p: u64) -> Vec<u64> {
  let mut fvec = vec![false; p as usize];
  fvec[0] = true;
  fvec[1] = true;
  head
    .iter()
    .flat_map(|&x| (2u64..p).map(move |i| i * x).take_while(|y| *y < p))
    .for_each(|y| fvec[y as usize] = true);
  fvec
    .iter()
    .enumerate()
    .filter_map(|(i, x)| if !*x { Some(i as u64) } else { None })
    .collect()
}

and a faster sieving algorithm (sundarams sieve, i can post the code if it is of interest)

Benchmarks:

test calc_erathosthenes          ... bench:      34,192 ns/iter (+/- 2,902)
test calc_exercism               ... bench:      79,314 ns/iter (+/- 16,276)
test calc_stackoverflow          ... bench:      35,431 ns/iter (+/- 5,674)
test calc_stackoverflow_improved ... bench:      32,508 ns/iter (+/- 6,971)
test calc_sundaram               ... bench:      17,930 ns/iter (+/- 3,182)
test calc_veedrac                ... bench:      27,110 ns/iter (+/- 3,870)
test calc_veedrac_6              ... bench:      12,178 ns/iter (+/- 4,113)

score 0 · Answer 3 · answered May 09 '20 at 14:18

tl;dr

bitvecs might work, but need lots of special case optimization, to be performant
Clean implementations of decent algorthims is most important.

Long Version

One thing to note is that the performance of different methods might change with input size. It also might not, the point is that using only one input size when benchmark might yield results which don't generalize to all sizes.

Here's the original code in the question:


pub fn primes(n: u64) -> Vec<usize> {
    if n < 4 {
        vec![2]
    } else {
        base(primes((n as f64).sqrt().ceil() as u64), n)
    }
}

fn base(r: Vec<usize>, p: u64) -> Vec<usize> {
    let fvec = (0..p).map(|_| false).collect::<Vec<bool>>();
    let z = r
        .iter()
        .map(|&x| (0..x).map(|y| y > 0).collect::<Vec<bool>>())
        .map(|x| {
            (0..p)
                .map(|n| !x[(n % (x.len() as u64)) as usize])
                .collect::<Vec<bool>>()
        })
        .fold(fvec, |acc, x| {
            acc.iter().zip(x).map(|(a, b)| *a || b).collect()
        });
    let yy: Vec<usize> = z
        .iter()
        .enumerate()
        .filter_map(|(i, x)| if !*x { Some(i) } else { None })
        .skip(1)
        .collect();
    r.iter().chain(yy.iter()).map(|x| *x).collect()
}

I found the above code rather tricky to read.understand. It seems to have a lot of unnecessary noise, like unneeded collects. Additionally the variable names could be a lot more descriptive.

Below is my naive implementation of a sieve(calc_primes_mine_naive, on the benchmarks below). It has no mutability, though admittedly has an unnecessary collect. It is 2x to 3x faster than the above.

pub fn primes_less_than(n: u64) -> Vec<usize> {
    let is_not_prime: Vec<bool> = (0..n).map(|num| {
        let sieve_upper_bound = (num as f64).sqrt().ceil() as u64 + 1;
        (2..sieve_upper_bound).any(|divisor| {
            (num % divisor) == 0
        })
    }).collect();
    is_not_prime.iter().enumerate().filter(|(_, this_one_not_prime)| {
        !**this_one_not_prime
    }).map(|(definitely_prime, _)| { definitely_prime }).collect()
}

Below is the first attempt at optimizing. Generaly sqrts are slow, so getting rid of repeated sqrts as above should be an absolute win(didn't seem to make much a difference irl).

pub fn primes_less_than_naive(n: u64) -> Vec<usize> {
    let mut sqrt = 1;
    let is_not_prime: Vec<bool> = (0..n).map(|num| {
        if sqrt * sqrt <= num {
            sqrt += 1;
        }
        let sieve_upper_bound = sqrt + 1;
        (2..sieve_upper_bound).any(|divisor| {
            (num % divisor) == 0
        })
    }).collect();
    is_not_prime.iter().enumerate().filter(|(_, this_one_not_prime)| {
        !**this_one_not_prime
    }).map(|(definitely_prime, _)| { definitely_prime }).collect()
}

I looked at the assembly and profiled it in vtune. Vtune was complaining about "high MS switch" overhead. I intially assume this was caused by casts to and from floats, but it continued after all float operations where removed. Looking at the assembly seemed to imply that there might still be some overhead due to collect, so I got rid of the extra collect.

pub fn primes_less_than_naive2(n: u64) -> Vec<usize> {
    let mut sqrt = 1;
    (0..n).map(|num| {
        if sqrt * sqrt <= num {
            sqrt += 1;
        }
        let sieve_upper_bound = sqrt + 1;
        (2..sieve_upper_bound).any(|divisor| {
            (num % divisor) == 0
        })
    }).enumerate().filter(|(_, this_one_not_prime)| {
        !*this_one_not_prime
    }).map(|(definitely_prime, _)| { definitely_prime }).collect()
}

Having no mutability seems to have hurt performance a fair bit. There may be a way to express this without using a mut, but probably would be rather unwieldy code.

pub fn primes_less_than_naive_with_mut(n: usize) -> Vec<usize> {
    let mut is_not_prime = vec![false; n as usize];
    let sieve_upper_bound = (n as f64).sqrt().ceil() as usize + 1;
    (1..sieve_upper_bound).for_each(|step_size| {
        (0..n).step_by(step_size).for_each(|i| {
            is_not_prime[i] = true;
        })
    });
    is_not_prime.iter().enumerate().filter(|(_, this_one_not_prime)| {
        !**this_one_not_prime
    }).map(|(definitely_prime, _)| { definitely_prime }).collect()
}

At this point vtune complains that some parts of the above are core-bound, and some parts are memory bound. This is annoying, b/c the way to fix the memory bound issue is to use bitvecs, and the way to fix core bound is to not use bitvecs. In attempt to alleviate memory boundness I tried alternating the loop direction, the idea being that this would make better use of cache. When I first wrote this I found some evidence it worked, but since then I haven't.

pub fn primes_less_than_naive_alternating(n: usize) -> Vec<usize> {
    let mut is_not_prime = vec![false; n as usize];
    let sieve_upper_bound = (n as f64).sqrt().ceil() as usize + 1;
    let mut forward = true;
    (1..sieve_upper_bound).for_each(|step_size| {
        if forward {
            (0..n).step_by(step_size).for_each(|i| {
                is_not_prime[i] = true;
            })
        } else {
            (0..n).step_by(step_size).rev().for_each(|i| {
                is_not_prime[i] = true;
            })
        };
        forward = !forward;
    });
    is_not_prime.iter().enumerate().filter(|(_, this_one_not_prime)| {
        !**this_one_not_prime
    }).map(|(definitely_prime, _)| { definitely_prime }).collect()
}

Bitvec version. This is slower(unsurprisingly). It may be possible to make it fast, by having some kind of mask lookup table for small step_size, but I'm not optimistic. Vtune has nothing interesting to say except that there might be lots of data dependencies messing this up.(aka it can't out of order well b/c the next byte depends on the previous byte[b/c behind the scenes the loads and stores are word sized?])

pub unsafe fn primes_less_than_naive_bitvec(n: usize) -> Vec<usize> {
    let num_bytes = n / 8 + 1;
    let mut is_not_prime = libc::calloc(size_of::<u8>(), num_bytes) as *mut u8;
    let sieve_upper_bound = (n as f64).sqrt().ceil() as usize + 1;
    (1..sieve_upper_bound).for_each(|step_size| {
        (0..n).step_by(step_size).for_each(|i| {
            let address = i >> 3;
            let bit_within_the_byte = i & 0b111;
            let cur = is_not_prime.offset(address as isize);
            *cur |= ((0x1 as u8) << bit_within_the_byte);
        });
    });
    (0..n).filter(|i|{
        let address = i / 8;
        let bit_within_the_byte = i % 8;
        let cur = is_not_prime.offset(address as isize).read();
        let is_not_prime = (cur | ((0x1 as u8) << bit_within_the_byte)) > 0;
        !is_not_prime
    }).collect::<Vec<usize>>()
}

At n=10:

test calc_primes                                 ... bench:         752 ns/iter (+/- 61)
test calc_primes_mine                            ... bench:         314 ns/iter (+/- 20)
test calc_primes_mine_alternating_loop_direction ... bench:         110 ns/iter (+/- 6)
test calc_primes_mine_bitvec                     ... bench:         165 ns/iter (+/- 888)
test calc_primes_mine_naive                      ... bench:         304 ns/iter (+/- 29)
test calc_primes_mine_naive2                     ... bench:         250 ns/iter (+/- 16)
test calc_primes_mine_naive_with_mut             ... bench:          70 ns/iter (+/- 4)

At n=100:

test calc_primes                                 ... bench:       5,924 ns/iter (+/- 2,210)
test calc_primes_mine                            ... bench:       3,384 ns/iter (+/- 88)
test calc_primes_mine_alternating_loop_direction ... bench:         673 ns/iter (+/- 23)
test calc_primes_mine_bitvec                     ... bench:       1,194 ns/iter (+/- 36)
test calc_primes_mine_naive                      ... bench:       3,141 ns/iter (+/- 195)
test calc_primes_mine_naive2                     ... bench:       2,919 ns/iter (+/- 107)
test calc_primes_mine_naive_with_mut             ... bench:         521 ns/iter (+/- 31)

At n=1000:

test calc_primes                                 ... bench:     120,828 ns/iter (+/- 18,018)
test calc_primes_mine                            ... bench:      59,269 ns/iter (+/- 6,993)
test calc_primes_mine_alternating_loop_direction ... bench:       7,080 ns/iter (+/- 2,795)
test calc_primes_mine_bitvec                     ... bench:      16,186 ns/iter (+/- 470)
test calc_primes_mine_naive                      ... bench:      56,572 ns/iter (+/- 1,849)
test calc_primes_mine_naive2                     ... bench:      54,526 ns/iter (+/- 9,773)
test calc_primes_mine_naive_with_mut             ... bench:       5,873 ns/iter (+/- 531)

At n=10,000

test calc_primes                                 ... bench:   2,672,089 ns/iter (+/- 117,553)
test calc_primes_mine                            ... bench:   1,222,262 ns/iter (+/- 24,242)
test calc_primes_mine_alternating_loop_direction ... bench:      77,348 ns/iter (+/- 4,845)
test calc_primes_mine_bitvec                     ... bench:     201,894 ns/iter (+/- 4,446)
test calc_primes_mine_naive                      ... bench:   1,213,264 ns/iter (+/- 160,097)
test calc_primes_mine_naive2                     ... bench:   1,174,753 ns/iter (+/- 88,116)
test calc_primes_mine_naive_with_mut             ... bench:      66,283 ns/iter (+/- 6,358)

At=100,000:

test calc_primes                                 ... bench:  71,953,394 ns/iter (+/- 11,795,923)
test calc_primes_mine                            ... bench:  27,542,006 ns/iter (+/- 5,158,828)
test calc_primes_mine_alternating_loop_direction ... bench:   1,103,863 ns/iter (+/- 135,770)
test calc_primes_mine_bitvec                     ... bench:   2,547,044 ns/iter (+/- 180,308)
test calc_primes_mine_naive                      ... bench:  27,399,779 ns/iter (+/- 3,121,742)
test calc_primes_mine_naive2                     ... bench:  26,430,825 ns/iter (+/- 1,818,277)
test calc_primes_mine_naive_with_mut             ... bench:   1,045,803 ns/iter (+/- 103,090)

At n=1000000:

test calc_primes                                 ... bench: 1,827,068,877 ns/iter (+/- 131,368,352)
test calc_primes_mine                            ... bench: 658,940,900 ns/iter (+/- 45,559,627)
test calc_primes_mine_alternating_loop_direction ... bench:  18,112,057 ns/iter (+/- 1,839,472)
test calc_primes_mine_bitvec                     ... bench:  30,189,983 ns/iter (+/- 2,649,749)
test calc_primes_mine_naive                      ... bench: 658,018,301 ns/iter (+/- 39,323,294)
test calc_primes_mine_naive2                     ... bench: 651,249,390 ns/iter (+/- 22,800,036)
test calc_primes_mine_naive_with_mut             ... bench:  18,212,183 ns/iter (+/- 2,936,353)

Optimise rust N primes generator

3 Answers3

tl;dr

Long Version