Why does my sieve not perform well for finding primes?

Question

I wrote two prime finder functions and the sieve only performs about 10% better. I'm using two optimizations for the simple version.

Don't check even numbers
Only check up to the square root or j * j <= i. ( equivalent )

and one optimization for the sieve version

Only check up to the square root or i * i <= n. ( equivalent )

What optimizations can I add to the sieve?

My sieve is pretty slow. I don't want to do a bitwise implementation yet, I want to understand if this implementation offers any benefits.

Or if I missed an implementation point.

The inner for loop in the pseudocode here looks interesting / odd

https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes

I don't know how to interpret it. (update: the OP seems to indicate in the comments that it was an issue with incorrect formatting after copy-pasting the pseudocode from Wikipedia, and with the corrected formatting it is clear now)

Here it is:

algorithm Sieve of Eratosthenes is:

input: an integer n > 1.
output: all prime numbers from 2 through n.

let A be an array of Boolean values, indexed by integers 2 to n,
initially all set to true.

for i = 2, 3, 4, ..., not exceeding √n do
    if A[i] is true
        for j = i², i²+i, i²+2i, i²+3i, ..., not exceeding n do
            A[j] := false

return all i such that A[i] is true.

// prime-2
// 2 optimizations - odds and square root
function prime2(n){
  const primes = [2];
  not_prime: for(let i = 3; i < n; i += 2){
    for(let j = 2; j * j <= i; j++){
      if(i % j === 0){
        continue not_prime;
      }
    }
    primes.push(i);
  }
  return primes;
}

// prime-3
// sieve implementation
function prime3 (n) {
  const primes = [];
  const sieve = (new Array(n)).fill(true);
  for (let i = 2; i * i <= n; i += 1) {
    if (sieve[i]) {
      for (let j = i + i; j < n; j += i) {
        sieve[j] = false;
      }
    }
  }
  makePrimes(sieve, primes, n);
  return primes;
};
function makePrimes(sieve, primes, n){
  for (let i = 2; i < n; i++) {
    if(sieve[i]) {
      primes.push(i);
    }
  }
}

What is your test input for comparison? Also, your sieve can set `j = i * i` to start, not `i + i` (because anything below `i * i` should already have been sieved by other multiples, and can increment `j` by `i * 2`, not just `i` each time (because the other half of all multiples would be even multiples). The `j += i * 2` optimization should halve the work, and `let j = i * i` should skip quite a bit of unnecessary work (can't be bothered to work out if it affects big-O in any way, or even produces a strict divisor on work, but it's something). — ShadowRanger, Feb 24 '20 at 15:41
Is `i2` there meant to mean `i²`? If so, that's reasonable. It's just you can simplify further to make it `j = i², i²+2i, i²+4i, i²+6i,..., not exceeding √n`, as long as multiples of 2 are special cased, such that they're either preset to "not prime" or not processed at all, because `+i, +3i` etc. are guaranteed to be even when added to `i²` (because `i` is odd, so `i²` is also odd, so `i²` + any odd multiple of `i` will be even, and can't possibly be prime). — ShadowRanger, Feb 24 '20 at 16:10
My math is terrible. It's just that, for reasons I don't even remember, through every new technique and every new language I've learned, one of my first self-tests is to use it to write a better Sieve of Eratosthenes using the new language/technique, so I'm good with it specifically, not higher math in general (I was a math minor, but haven't applied it much). :-) — ShadowRanger, Feb 25 '20 at 19:29
Note that there's a bug in `prime2`, which reports `9`, `25`, `49`, and `81` as prime. To fix it, replace ` j * j < i` with ` j * j <= i`. — Scott Sauyet, Feb 26 '20 at 14:52
I find a large speed difference. When calculating the primes up to `1e5` `primes2` takes 24 ms on my machine, while `primes3` takes 10ms. For `1e6` it's 279ms vs 36ms. For `1e7` it's 6946ms vs 291ms. — Scott Sauyet, Feb 26 '20 at 15:19
@ScottSauyet kudos for measuring it that way, ready for the [empirical orders_of_growth](http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth) analysis to be applied! the 1e5 times are too small to be taken into account, but the rest gives us approx *n^1.4* for the first, and *n* for the second, *empirically*, which is the true measure of the difference between them. both are roughly in line with the true complexities of *optimal trial division* sieve, and the *true sieve of Eratosthenes* (on smallish rages). — Will Ness, Feb 26 '20 at 16:59
@WillNess: There are also space requirements to consider. Either of these has `O(n)` storage requirements. There's an [alternate version of the Sieve](https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) (Haskell, but not too hard to port to JS) that -- at a somewhat inexperienced glance -- takes space of `O(sqrt(n) / log(n))`. But it's much slower in my tests. There are always trade-offs! — Scott Sauyet, Feb 26 '20 at 18:26
@j.a.. Yeah, starting with the square and adding `i` each time is what you need to do if you don't special case `2` in some way. If you do special case `2` though (e.g., by simply hard coding `2` as prime and ignoring all even indices, or not even allocating them), then you can add `2 * i` each time, because `i * i` is odd, so adding any odd multiple of `i` to it will produce an even number (which can't be prime except in the trivial case of `2`). Thus, half the work, in exchange for special-casing `2`. — ShadowRanger, Feb 26 '20 at 18:26
@ScottSauyet you probably refer to the segmented sieve; of course space is also important, but the time complexity stays the same. and as the OP is discovering here, whatever optimizations are present or *missing*, still the algorithmic complexity is the most important thing about a given code. :) --- as to the Haskell code, I would humbly recommend [this page](https://wiki.haskell.org/Prime_numbers). And if you read Python, check out my highest voted [primes] answer. :) — Will Ness, Feb 26 '20 at 18:28
@ShadowRanger the pseudocode is written that way so that it does not become overly complex. it serves an illustrative purpose first and foremost. even the presence of the start-from-the-square optimization could be argued against, but to add any more optimizations into it would clearly be counterproductive. — Will Ness, Feb 26 '20 at 18:30
@WillNess: Yeah, I'm not arguing the pseudo-code is wrong or deficient somehow. It's pseudocode illustrating the algorithm, not an actual implementation with real world optimizations built-in. Just pointing out that if they're really concerned with performance, actual computers benefit from going beyond the pseudo-code for additional optimizations. — ShadowRanger, Feb 26 '20 at 18:32
@ScottSauyet actually the code in the article misses that optimization, so it's O(n/log n) in space. the author later added that ["postponement"](https://stackoverflow.com/search?q=user%3A849891+postponed+%5Bprimes%5D) optimization (without naming it) to the code in the accompanying source code ZIP file, to achieve the reduction in space complexity that you mention. — Will Ness, Feb 26 '20 at 18:36
@WillNess: I wasn't really talking about a segmented sieve, just one like http://gist.github.com/CrossEye/e81013fe5dd4f4d036b855e14b1f78d3. I wrote that long ago, and only updated it to use generators a few years ago; but the basic idea is standard. The storage needed (assuming you consume the generated values and don't try to store them) should be proportional to the number of primes below sqrt(n). Interesting reading, BTW! — Scott Sauyet, Feb 26 '20 at 19:23
@ScottSauyet yeah, I later realized this, that's why I switched to mentioning that Python example and the "postponement" stuff. thanks :) — Will Ness, Feb 26 '20 at 19:25
@j.a: For some reason, my company is blocking repl.it. I posted a version of this at https://runkit.com/crosseye/so-60378812. The numbers seem similar to what I reported out of my local browser. — Scott Sauyet, Mar 02 '20 at 15:04
Compare it to [highly-optimized implementations](https://github.com/vitaly-t/prime-lib/blob/main/src/soe-generators.ts). — vitaly-t, Sep 20 '22 at 00:58

Will Ness · Accepted Answer · 2020-03-03T06:34:58.440

What you see is an expression of the differences in theoretical run time complexities, i.e. the true algorithmic differences between the two algorithms.

Optimal trial division sieve's complexity is O(n^1.5/(log n)²)^(*) whereas the sieve of Eratosthenes' complexity is O(n log log n).

According to the empirical run time figures posted by Scott Sauyet in the comments,

   1e6      279ms      36ms
   1e7     6946ms     291ms
   -------------------------
   n^       1.40       0.91

the empirical orders of growth are roughly ~n^1.4 and ~n in the measured range, which is a good fit.

So your genuine sieve does perform well. The trial division sieve performs as expected. The algorithmic nature of a code will always beat any presence or absence of any secondary optimizations, if we increase the problem size enough.

And comparing performances by measuring them at just one problem-size point is never enough. So even if you see just 10% difference over the "simpler one", if you test at bigger sizes, the difference will be bigger.

If you want some pointers about what can be further improved in your code, do note that you start the inner loop from i+i instead of from i*i, for starters.

Another common optimization is to special-case 2, start from 3 and increment the candidates by 2 and use the inner loop increment of 2*i instead of just i, to achieve instant 2x speedup. This is the simplest form of wheel factorization optimization, which can be further applied, with diminishing returns though for each additional prime. But using 2-3-5-7 is common and should give about another 2x speedup, if memory serves.

Last but not least, make it segmented.

^(*) that's π(n)* π(√n) coming from primes, and no more than that, from the composites.

Why does my sieve not perform well for finding primes?

1 Answers1

Linked