Sieve of Eratosthenes for large numbers c++

Question

Just like this question, I also am working on the sieve of Eratosthenes. Also from the book "programming principles and practice using c++", chapter 4. I was able to implement it correctly and it is functioning exactly as the exercise asks.

#include <iostream>
#include <vector>

using namespace std;

int main() {
    unsigned int amount = 0;

    cin >> amount;

    vector<int>numbers;

    for (unsigned int i = 0; i <= amount; i++) {
        numbers.push_back(i);
    }

    for (unsigned int p = 2; p < amount; p++) {
        if (numbers[p] == 0)
            continue;

        cout << p << '\n';

        for (unsigned int i = p + p; i <= amount; i += p) {
            numbers[i] = false;
        }
    }

    return 0;
}

Now, how would I be able to handle real big numbers in the amount input? The unsigned int type should allow me to enter a number of 2^32=4,294,967,296. But I can't, I run out of memory. Yes, I've done the math: storing 2^32 amount of int, 32 bits each. So 32/8*2^32=16 GiB of memory. I have just 4 GiB...

So what I am really doing here is setting non-primes to zero. So I could use a boolean. But still, they would take 8 bits, so 1 byte each. Theoretical I could go to the limit for unsigned int (8/8*2^32=4 GiB), using some of my swap space for the OS and overhead. But I have a x86_64 PC, so what about numbers larger than 2^32?

Knowing that primes are important in cryptography, there must be a more efficient way of doing this? And are there also ways to optimize the time needed to find all those primes?

score 6 · Answer 1 · edited May 23 '17 at 11:44

In the sense of storage, you could use the std::vector<bool> container. Because of how it works, you have to trade in speed for storage. Because this implements one bit per boolean, your storage becomes 8 times as efficient. You should be possible to get numbers close to 8*4,294,967,296 if you have all your RAM available for this one program. Only thing you need to do, is use unsigned long long to unleash the availability of 64 bit numbers.

Note: Testing the program with the code example below, with an amount input of 8 billion, caused the program to run with a memory usage of approx. 975 MiB, proving the theoretical number.

You can also gain some time, because you can declare the complete vector at once, without iteration: vector<bool>numbers (amount, true); creates a vector of size equal to input amount, with all elements set to true. Now, you can adjust the code to set non-primes to false instead of 0.

Furthermore, once you have followed the sieve up to the square root of amount, all numbers that remain true are primes. Insert if (p * p >= amount) as an additional continue condition, just after you output the prime number. Also this is a humble improvement for your processing time.

Edit: In the last loop, p can be squared, because all numbers until the square of p are already proved not to be primes by previous numbers.

All together you should get something like this:

#include <iostream>
#include <vector>

using namespace std;

int main() {
    unsigned long long amount = 0;

    cin >> amount;

    vector<bool>numbers (amount, true);

    for (unsigned long long p = 2; p < amount; p++) {
        if ( ! numbers[p])
            continue;

        cout << p << '\n';

        if (p * p >= amount)
            continue;

        for (unsigned long long i = p * p; i <= amount; i += p) {
            numbers[i] = false;
        }
    }

    return 0;
}

No, that would square the numbers. Imagine the first loop is at p=3. Then i=3*3 = 9. This would skip 6. Wait... that has been put on false by 2 already. Let me think about p=5. then i=25. 10 is already false'd by 2 and 15 by 3. I _think_ you are right, I will check — Tim, Jan 20 '15 at 19:53
Yes, program outputs exactly the same numbers, but it increases the speed. I will edit the answer. — Tim, Jan 20 '15 at 19:55

score 1 · Answer 2 · edited May 23 '17 at 12:04

1

You've asked a couple of different questions.

For primes up to 2**32, sieving is appropriate, but you need to work in segments instead of in one big blog. My answer here tells how to do that.

For cryptographic primes, which are very much larger, the process is to pick a number and then test it for primality, using a probabilistic test such as a Miller-Rabin test or a Baillie-Wagstaff test. This process isn't perfect, and occasionally a composite might be chosen instead of a prime, but such an occurrence is very rare.

edited May 23 '17 at 12:04

Community

1
1

answered Jan 20 '15 at 20:01

user448810

17,381
4
34
59

If I understand correctly, segments removes the memory limits I'm hitting. So it would be possible to come close to 2^64? (Which would be significantly more then the limits I have now). – Tim Jan 20 '15 at 20:19
It removes the limit on the size of the sieve. But you must also either pre-compute and store the sieving primes, or compute them on the fly. It is possible to sieve to 2**64, but difficult. You might find [this page](http://sweet.ua.pt/tos/software/prime_sieve.html) interesting. – user448810 Jan 20 '15 at 20:44

Sieve of Eratosthenes for large numbers c++

2 Answers2

Linked