Why is this Sieve of Sundaram implementation much faster than this Sieve of Eratosthenes implementation?

Question

I'm currently trying to compare the average runtime speed of two different prime numbers generating algorithms. I have this naive implementation of the Sieve of Eratosthenes:

std::vector<int32_t> sieve_of_eratosthenes(int32_t n) {
    std::vector<int32_t> primes;
    std::vector<bool> sieve(n + 1, true);
    for (int32_t i = 2; i * i <= n; i++) {
        if (sieve[i]) {
            for (int j = i * i; j <= n; j += i) {
                sieve[j] = false;
            }
        }
    }
    for (int32_t i = 2; i < n; ++i) {
        if (sieve[i]) primes.push_back(i);
    }
    return primes;
}

and this implementation of the Sieve of Sundaram:

std::vector<int32_t> sieve_of_sundaram(int32_t n) {
    std::vector<int32_t> primes;
    int32_t k = (n - 1) / 2;
    std::vector<bool> sieve(k + 1, true);
    for (int32_t i = 1; i * i <= k; ++i) {
        if (sieve[i]) {
            for (int32_t j = i; i + j + 2 * i * j <= k; ++j) {
                sieve[i + j + 2 * i * j] = false;
            }
        }
    }
    if (n > 2) primes.push_back(2);
    for (int32_t i = 1; i <= k; ++i) {
        if(sieve[i]) primes.push_back(2 * i + 1);
    }
    return primes;
}

This is how I try to measure the speed of the algorithms. test is an argument entered from the command line. The main program will be manually run 2 times, the first time will print out the average speed of sieve_of_eratosthenes and the second time will be that of sieve_of_sundaram.

std::vector<int32_t> test_input(1000, 100000);
std::vector<int32_t> result;
switch (test) {
    case 1:
        for (auto n : test_input) {
            auto start = std::chrono::high_resolution_clock::now();
            auto tmp = sieve_of_eratosthenes(n);
            auto end = std::chrono::high_resolution_clock::now();
            int32_t runtime = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
            result.push_back(runtime);
        }
        break;
    case 2:
        for (auto n : test_input) {
            auto start = std::chrono::high_resolution_clock::now();
            auto tmp = sieve_of_sundaram(n);
            auto end = std::chrono::high_resolution_clock::now();
            int32_t runtime = std::chrono::duration_cast<std::chrono::nanoseconds>(endd - start).count();
            result.push_back(runtime);
        }
        break;
    default:
        break;
}
std::cout << get_avg(result); // sum of all test results / result.size()

According to the theoretical time complexity, the Sieve of Eratosthenes should give faster result (O(nlog(logn)) is faster than O(nlogn), which is the time complexity of the Sieve of Sundaram). However, the Sieve of Sundaram is actually way faster (almost 3x that of the Sieve of Eratosthenes).

Result (measured in nanoseconds):

434379
179904

The difference between *loglog(n)* and *log(n)* is so small that constant factors quickly comes into play. — Captain Giraffe, Nov 15 '22 at 05:45
@CaptainGiraffe I tried it once again with 1000000 and it's still the same difference multiplied by 10. — trietng, Nov 15 '22 at 06:34
@trietng Yeah, that's the dominating factor *n*. The difference in *log(log(n))* and *log(n)* is really small at those scales. Not something really measurable. — Captain Giraffe, Nov 15 '22 at 08:18
Could you include the definition of `get_avg` as well? In fact, showing the entire program, and explaining exactly which flags you used to compile, and the exact steps taken to run the program will be helpful, as others can then try and replicate the results you're getting. — cigien, Nov 15 '22 at 09:33
Well, look how many times the inner loop is executed: https://godbolt.org/z/nK1r4caM4 — Bob__, Nov 15 '22 at 09:33
@CaptainGiraffe I just did some more reading on this problem and there is indeed a hidden constant factor! This implementation does not clearly show that, so I was confused. Thank you. — trietng, Nov 15 '22 at 11:29
@Bob__ Also, thank you too, Bob. I appreciate your effort in showing me the number of executions in the inner loop. Things finally clicked for me when I saw your godbolt output. — trietng, Nov 15 '22 at 11:31
If you switch to a simple odds-only SoE, the inner loop count becomes identical. This is a very small change. — DanaJ, Nov 16 '22 at 04:47
always measure [Empirical orders of growth](http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth) to get a [more meaningful comparison](https://stackoverflow.com/a/30116605/849891) — Will Ness, Nov 25 '22 at 23:12

score 2 · Answer 1 · edited Nov 27 '22 at 09:23

You have not coded up the classic Sieve of Sundaram (that sieves by odds and indeed requires O(N log N) operations), but rather used a small modification (if (sieve[i]) ...) that makes it equivalent to a version of the Sieve of Eratosthenes that avoids considering the even numbers.

Since your other code uses the classic Sieve of Eratosthenes (that sieves by primes) and does consider the even numbers, your supposedly Sundaram sieve's code is indeed faster by a constant factor.

This is explained in the Wikipedia article on the Sieve of Sundaram.

Why is this Sieve of Sundaram implementation much faster than this Sieve of Eratosthenes implementation?

1 Answers1