Checking Uniqueness of Shifting Sieve of Eratosthenes

Question

My code is as follows:

int main(){
    vector<int> primes;
    vector<int> primesSum;
    int tally = 1;
    bool primeFound = false;
    while (true) {
        int i = 0;
        while (!primeFound) {
            primeFound = true;
            tally++;
            for (i = 0; i < (int)primes.size(); i++) {
                if (tally == primesSum[i]) {
                    primeFound = false;
                    primesSum[i] += primes[i];
                }
            }
        }
        primeFound = false;
        primesSum.push_back(tally*2);
        primes.push_back(tally);
    }
}

It seems to generate prime numbers fine but I was wondering if I could increase its efficiency by implementing the rule that all primes are 6n +/- 1 after 2 and 3. That would seem to sacrifice my achievement of initially empty vectors though.

I've seen prime number verifiers that make use of the 6n +/- i rule but not really in sieve programs, likely as it breaks the 2 filter.

Edit: as an aside I would prefer to not use modulo as that is another achievement for this program, unless a modulo is more efficient (time-wise not space-wise) than what I currently have.

Its common to combine [wheel factorization](https://en.wikipedia.org/wiki/Wheel_factorization) with sieving (particularly the [sieve of eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)). — Dillon Davis, Apr 26 '23 at 06:09
@DillonDavis Yes but I can't really figure out how to implement it without breaking it being infinitely repeatable and maintaining not knowing 2 and 3 are prime. — Joseph Cook, Apr 26 '23 at 06:18
ot: `(int)primes.size()` you should not use a casts like this to silence a compiler warning. Why is `i` not unsigned? — 463035818_is_not_an_ai, Apr 26 '23 at 07:50
@463035818_is_not_a_number thats less of me thinking it’s better in an efficiency sense and moreso a matter of preference. I can more easily grasp int at first glance than unsigned. — Joseph Cook, Apr 26 '23 at 08:53
my point is that `(int)` as cast is a red flag. It will not pass a code review. Correctness is way more important than performance. The cast is ok here, but nevertheless it hides a compiler warning without actually fixing the issue — 463035818_is_not_an_ai, Apr 26 '23 at 08:55

RandomBits · Accepted Answer · 2023-04-28T15:39:12.293

Here is some code that I wrote to efficiently generate primes using a sieve that only considers integers that are +-1 mod 6. It is written as a generator using a coroutine library, but the logic is equivalent. It uses a vector of uint64_t's as a bit mask.

There are no modulo's related to computing the primes, but they are used to access the bitmap.

Update

I extracted my original code from the generator and created a simple function that puts the primes in a vector. One method uses the +-1 mod 2 search and the other +-1 mod 6 search. I also extracted the OP's code into a function.

I ran some simple timing measurements on an M1 Max (arm64).

function	primes<1M	primes<10M	primes<100M
sieve_index	12904 ms	still running...
sieve_2n	2 ms	30 ms	222 ms
sieve_6n.	1 ms	15 ms	132 ms

The code and build instructions can be found at the GitHub repo cpp-core/so

Code

auto sieve_mod6_prime_seq(int max = int{1} << 20) {
    std::vector<int> primes;
    primes.push_back(2);
    primes.push_back(3);

    auto max_index = max / 3;
    auto bits_per = sizeof(uint64_t) * CHAR_BIT;
    auto nwords = (bits_per + max_index - 1) / bits_per;
    std::vector<uint64_t> words(nwords);

    words[0] |= 1;
    size_t wdx = 0;
    while (wdx < nwords) {
        auto b = std::countr_one(words[wdx]);
        auto p = 3 * (64 * wdx + b) + 1 + (b bitand 1);
        if (b < 64 and p < max) {
            primes.push_back(p);

            for (auto j = p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }

            for (auto j = 5 * p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }
        }
        else {
            ++wdx;
        }
    }
    return primes;
}

auto sieve_mod2_prime_seq(int max = int{1} << 20) {
    std::vector<int> primes;

    auto bits_per = 2 * sizeof(uint64_t) * CHAR_BIT;
    auto nwords = (bits_per + max - 1) / bits_per;
    std::vector<uint64_t> words(nwords);
    if (max > 2)
        primes.push_back(2);
    words[0] |= 1;
    size_t idx = 0;
    while (idx < nwords) {
        auto bdx = 2 * std::countr_one(words[idx]) + 1;
        auto p = 128 * idx + bdx;
        if (bdx < 128 and p < max) {
            primes.push_back(p);
            for (auto j = p; j < max; j += 2 * p) {
                auto jdx = j / 128;
                auto jmask = uint64_t{1} << ((j % 128) / 2);
                words[jdx] |= jmask;
            }
        }
        else {
            ++idx;
        }
    }
    return primes;
}

auto sieve_index(int max = int{1} << 20) {
    std::vector<int> primes;
    std::vector<int> primesSum;
    int tally = 1;
    bool primeFound = false;
    while (tally < max) {
        int i = 0;
        while (!primeFound) {
            primeFound = true;
            tally++;
            for (i = 0; i < (int)primes.size(); i++) {
                if (tally == primesSum[i]) {
                    primeFound = false;
                    primesSum[i] += primes[i];
                }
            }
        }
        primeFound = false;
        primesSum.push_back(tally*2);
        primes.push_back(tally);
    }
    return primes;
}

template<class Work>
void measure(std::ostream& os, std::string_view desc, Work&& work) {
    chron::StopWatch timer;
    timer.mark();
    work();
    auto millis = timer.elapsed_duration<std::chrono::milliseconds>().count();
    os << fmt::format("{:>30s}: {} ms", desc, millis) << endl;
}

int tool_main(int argc, const char *argv[]) {
    ArgParse opts
        (
         argValue<'n'>("number", 1000, "Number of primes"),
         argFlag<'v'>("verbose", "Verbose diagnostics")
         );
    opts.parse(argc, argv);
    auto n = opts.get<'n'>();
    // auto verbose = opts.get<'v'>();

    measure(cout, "sieve_index", [&]() { sieve_index(n); });
    measure(cout, "sieve_2n", [&]() { sieve_mod2_prime_seq(n); });
    measure(cout, "sieve_6n", [&]() { sieve_mod6_prime_seq(n); });
    return 0;
}

End Update

You can easily adapt it to construct a vector of primes by replacing each co_yield with vec.push_back.

coro::Generator<uint64_t> sieve_mod6_prime_seq(uint64_t max = uint64_t{1} << 20) {
    co_yield 2;
    co_yield 3;

    auto max_index = max / 3;
    auto bits_per = sizeof(uint64_t) * CHAR_BIT;
    auto nwords = (bits_per + max_index - 1) / bits_per;
    std::vector<uint64_t> words(nwords);

    words[0] |= 1;
    size_t wdx = 0;
    while (wdx < nwords) {
        auto b = std::countr_one(words[wdx]);
        auto p = 3 * (64 * wdx + b) + 1 + (b bitand 1);
        if (b < 64 and p < max) {
            co_yield p;

            for (auto j = p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }

            for (auto j = 5 * p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }
        }
        else {
            ++wdx;
        }
    }
    co_return;
}

That looks really nice and I'll have to give it a go later, but wouldn't the large amount of multiplication and division increase the time complexity? — Joseph Cook, Apr 28 '23 at 02:37
On modern hardware, arithmetic is much cheaper than memory access making it worthwhile in this case. Also, the division and modulo by 64 are converted to simple shift and mask operations by the compiler. I believe the code is well optimized at least for arm64 -- I haven't tested it on x86. — RandomBits, Apr 28 '23 at 04:15

Will Ness · Answer 2 · 2023-06-02T01:04:52.227

I must say, the naming of variables and the code structure as posted in the question, with the double while instead of an if, is very non-conducive to the code clarity.

Consequently, I'd re-structure your code as

int main(){
    vector<int> primes;   // primes known so far
    vector<int> mults;    // smallest multiple of each prime, above c
    int c = 1;            // candidates: 2,3,...
    bool composite;
    while (true) {
        composite = false;
        c++;              // 2,3,...
        for (int i = 0; i < (int)primes.size(); i++) {
            if (c == mults[i]) {  // if c is a multiple of ith prime,
                composite = true;   // then it's composite.
                mults[i] += primes[i]; // next multiple of this prime
            }
        }
        if( !composite ) {
            mults.push_back(c*2);   // first multiple of 
            primes.push_back(c);    //   newly found prime
        }
    }
}

Now we can see, first of all, a major algorithmic deficiency in your code: the wrong "data structure" you use to maintain the primes' multiples. Instead of min-heap priority queue which would immediately produce the next known multiple of a prime, you maintain two linear vectors and scan through them for each candidate number. This is bound to have major implications, complexity-wise. And with worse complexity no amount of constant factors improvements will help much.

But putting this aside, on to your question. You enumerate the multiples of each prime p as 2p, 3p, 4p, ....

First step is to switch it to p², p²+p, p²+2p, ....

Next, we notice that for all odd primes, p² is odd as well and so the enumeration can be changed to p², p²+2p, p²+4p, ..., to enumerate all odd multiples of it.

And what to do with the even ones? How to test for them now? The whole point to this "wheel factorization" optimization is to not do any such tests, but just ignore them. That's why it's an optimization in the first place.

And the way to ignore them is to not generate any even candidates, in the first place -- because we known in advance that all of them aren't primes any way. And that is the essence of the wheel optimization:

int main(){
    vector<int> primes;   // primes known so far
    vector<int> mults;    // smallest multiple of each prime, above c
    int c = 1;            // candidates: 3,5,...
    bool composite;
    while (true) {
        composite = false;
        c+=2;             // 3,5,...
        for (int i = 0; i < (int)primes.size(); i++) {
            if (c == mults[i]) {  // if c is a multiple of ith prime,
                composite = true;   // then it's composite
                mults[i] += 2*primes[i]; // next multiple of this prime
            }
        }
        if( !composite ) {
            mults.push_back(c*c);   // first multiple of 
            primes.push_back(c);    //   newly found prime
        }
    }
}

Next, switching to your desired mod 6 +-1 enumeration, we will ignore the multiples of 2 and 3. But then it means, we'll need to not generate any of them as well. The enumeration of multiples will be m=p²; m+=4; m+=2; m+=4; m+=2; m+=4; m+=2; ... for some, and m=p²; m+=2; m+=4; m+=2; m+=4; m+=2; m+=4; ... for others (you can figure out the details).

And the enumeration of the candidates will just start from 5, and also proceed as c=5; c+=2; c+=4; c+=2; c+=4; c+=2; c+=4; ....

Hopefully you can write this down now in an actual code.

P.S. you can find more details in several of my answers on the matter, e.g. here and here etc..

Checking Uniqueness of Shifting Sieve of Eratosthenes

2 Answers2

Code