Making parallel prime sieve with shared memory faster

Question

I have a prime sieve whose sequential version runs great. I finally figured out how to make the inner loop run in parallel, but (as I feared based on prior experience with other languages) the single threaded version is faster.

Can this parallel version in Rust be optimized?

extern crate crossbeam;

fn main() {

    let residues = [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67,
                   71, 73, 79, 83, 89, 97,101,103,107,109,113,121,127,131,137,139,
                  143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209,211];

    let val = 1_000_000;
    let md = 210;
    let rescnt = 48;

    println!("val = {}, mod = {}, rescnt = {}", val, md, rescnt);

    let mut posn = [0; 210];
    for i in 1..rescnt {posn[residues[i]] = i - 1;}
    posn[1] = rescnt - 1;

    let mut modk;  let mut r;  let mut k;

    let num = val - 1 | 1;
    k = num / md;  modk = md * k; r = 1;
    while num >= modk + residues[r] {r += 1;}
    let maxpcs = k * rescnt + r - 1; 
    let prms: Vec<u8> = vec![0; maxpcs];

    println!("num = {}, k = {}, modk = {}, maxpcs = {}", num, k, modk, maxpcs);

    let sqrt_n = (num as f32).sqrt() as usize;
    modk = 0;  r = 0;  k = 0;

    // sieve to identify/eliminate nonprimes/locations in prms array
    for i in 0..maxpcs {
        r += 1; if r > rescnt {r = 1; modk += md;  k += 1;};
        if  prms[i] == 1 {continue;}
        let prm_r = residues[r];
        let prime = modk + prm_r;
        if  prime > sqrt_n {break;}
        let prmstep = prime * rescnt;
        for ri in &residues[1..rescnt + 1] {
          let prms = &mut prms;
          crossbeam::scope(|scope| {
            scope.spawn(move || {
                let prod = prm_r * ri;
                let mut np = (k * (prime + ri) + (prod - 2) / md) * rescnt + posn[prod % md];
                while np < maxpcs {prms[np] = 1; np += prmstep;}
            });
          });
        }
    }
    // the prms array now has all the positions for primes r1..N
    // extract prime numbers and count from prms into prims array
    let mut prmcnt = 4;
    modk = 0; r = 0;
    for i in 0..maxpcs {
        r += 1; if r > rescnt {r = 1; modk += md;};
        if prms[i] == 0 {prmcnt += 1;}
    }
    println!("{}", prmcnt);
}

Using Rust 1.6 on Linux.

Duplicate of http://stackoverflow.com/q/33818141/155423 or http://stackoverflow.com/q/28599334/155423. — Shepmaster, Feb 23 '16 at 23:50
If you don't believe this to be a duplicate of either of them, I encourage you to [edit] your question to acknowledge that you have read them and explain why your question is different. Then you should reduce your code. Remove commented-out junk code, remove all the complicated logic around prime numbers. Create an [MCVE](/help/mcve) that shows the error you are getting with a minimum of extra fluff. Include the error you are getting *in the question* so we can know what it is you are seeing! — Shepmaster, Feb 24 '16 at 03:24
I see that you've updated your code, but it's still very large, compared to the problem you are describing. If the problem is "*The shared vector `prms` is not being updated by with index variable `nonprm`*", then you can remove **all** of the code that isn't the vector or the index. — Shepmaster, Feb 25 '16 at 13:00
Your code doesn't actually run in parallel: each call to `crossbeam::scope` spawns one thread, waits for it to end, then returns. The scope needs to enclose the `for` loop so that the spawned threads actually execute concurrently. However, just moving the whole loop into the closure passed to `crossbeam::scope` causes errors, because we can't safely give direct mutable access to `prms` to several threads at once. — Francis Gagné, Mar 13 '16 at 06:08

Making parallel prime sieve with shared memory faster

0 Answers0