-2

How do I find the sum of all primes below 10^12?

I use sieve of Eratosthenes with O(n * log(log(n))) but I want an algorithm to calculate this faster.

My code runs in 4 sec for 10^8, but it takes many hours to calculate 10^12.

This is my code:

#include <iostream>
#include <vector>
#include <algorithm>

#define big  long long int
#define ubig unsigned long long  
#define SZ(x) int(x.size())
#define Pb push_back

using namespace std;

const big maxn = 1000 * 1000 + 10;
const big mod = 100000000;
const big Delta = 100000000;

bool s[mod], p[maxn]; 
big b[maxn]; 

vector <big> primes; 

ubig ans = 0, n;

void init_primes () {
    for (big i = 2; i <= 1000 * 1000; i++) {

        if (p[i]) continue;

        primes.Pb(i);

        b[SZ(primes)-1] = primes.back() * primes.back();

        for (big j = i * i; j <= 1000 * 1000; j += i) {
            p[j] = 1;
        }

    }
}

void sieve (big from, big to) {

    cerr << to / mod << " of " << n / mod  << "\n" ;

    fill (s,s+mod,0);

    for (big i = 0; i < SZ(primes) && primes[i] <= to; i++) {

        big j = b[i];

        for (; j <= to; j += primes[i]) {
            s[j%mod] = 1;
        }

        if (j >= b[i]) {
            b[i] = j;
        }

    }

    for (big k = 0; k < mod; k++) {
        if (s[k] == 0) {
            ans += from + k;
            // ans %= Delta;
        }
    }

}

int main () {

    init_primes();

    n = 1000ll * 1000 * 1000 * 1000;
    // cin >> n;

    for (big i = 0; i + mod <= n; i += mod) {
        sieve (i,i+mod);
    }

    cout << ans-1 << endl;
}

The overflow from long long not matter now!

Iman.Ejt
  • 11
  • 10
  • 1
    If you just want fast, consider an approximation: http://math.stackexchange.com/a/777358 – Ira Baxter Jul 05 '15 at 07:39
  • 1
    Which techniques exist in the literature that are faster than Sieve of Eratosthenes? Have you tried them? How have you attempted to optimize your code? – Jonathan Leffler Jul 05 '15 at 07:40
  • 1
    Bitwise sieve can help you. – Mr. Perfectionist Jul 05 '15 at 07:44
  • 1
    @Iman.Ejt That comment makes no sense as a response to Jonathan Leffer's. "i tried all of them" -- then why are you even still asking about the Sieve of Eratosthenes? Or did you somehow manage to conclude that all other algorithms are worse and that you should stick with what you have? If so, how did you get to that conclusion? –  Jul 05 '15 at 08:04
  • 1
    I can get a sane answer for sum of primes up to 10^11 in less than a quarter of an hour: `Sum of primes to 100000000000 = 16999637006649164854 Count of primes to 100000000000 = 4118054813 real 13m18.378s user 13m7.114s sys 0m10.358s`. That uses a Sieve of Eratosthenes which only records the odd values in a bit mask (so it needs 10^11 / (64 * 2) `uint64_t` values). Multiplying by another factor of 10 exceeds the capacities of my machine; it starts paging (to magnetic disk) which is horribly slow. Therefore, my technique isn't good enough for this particular problem. – Jonathan Leffler Jul 05 '15 at 08:22
  • 1
    @JonathanLeffler without paging your projected time is 2.8h (assuming N^1.1 order of growth). I have a simple C++ [contiguous `vector` odds-only code](http://ideone.com/FaPOB) which projects at 5.2h (which is a similar time if we assume your box is ~2x faster than ideone). But if we switch to segmented sieve, paging should be gone, and if the segment fits into the cache, it should be yet faster overall. – Will Ness Jul 05 '15 at 11:23
  • 1
    You should take a look at [Segmented Sieve of Eratosthenes](http://stackoverflow.com/questions/10249378/segmented-sieve-of-eratosthenes). Since `sqrt(1E12)` is just `1E6`, it is pretty easy to build the set of primes less than the one million, and then apply that to the values in the range up to 1E12 (a trillion). – Jonathan Leffler Jul 06 '15 at 00:48

1 Answers1

2

I see two immediately obvious ways to speed up your algorithm.

First, you are using 1000 * 1000 as the limit for your loops. Better to do the calculation once: big limit = 1000 * 1000; and use the variable limit in your loops. Depending on your compiler, you might be repeating the multiplication every time round your loops.

Second, you are stepping your i-loop by 1. Don't do that. Treat the case for i=2 separately, and then loop from i=3, stepping 2 each time. That will halve the number of iterations of your outer loop. For further savings look at "Wheel factorization" methods.

You might want to try making p[] a bit array, rather than a boolean array to save on memory usage. Your speed problems might be due to an overloaded memory and too much disk swapping.

rossum
  • 15,344
  • 1
  • 24
  • 38
  • I think `vector` is automatically bit-packed. – Will Ness Jul 05 '15 at 11:25
  • Possibly true, my C++ is very rusty. Even if it isn't, there should be a way to do it; C++ has everything plus the kitchen sink. :) – rossum Jul 05 '15 at 11:40
  • I know it was so in previous versions of the lagnuage; don't know about the last one. This feature came under severe criticism because such "vector" can't return a true reference to its element. That was the claim IIRC. – Will Ness Jul 05 '15 at 11:45