1

I saw this c code of using Sieve method of Eratosthenes to find primes, but I cannot extend it to even larger integers (for example, to 1000000000 and even larger) because of memory consumption to allocate such a large char array.

What would be the strategies to extend the code to larger numbers? Any references are also welcome.

Thanks.

starblue
  • 55,348
  • 14
  • 97
  • 151
Qiang Li
  • 10,593
  • 21
  • 77
  • 148
  • How is memory consumption a problem? 10**9 bits is about 120 MB, which isn't too much on today's personal computers. Actually, it seems that even 2**32 (the whole range of unsigned 32 bit integers) fits into 512 MB. Sure, that's a lot for a single application, but memory is plenty these days. –  Oct 05 '11 at 20:17
  • I've checked the results with two other calculators and mental arithmetic suggests the results can't be completely off. 10**9 bit is 125 * 10**6 byte (divide by 8) is roughly 122 * 10**3 kilobyte (divide by 1024) is roughly 119 megabyte (divide by 1024). Please explain how you arrive at 1GB? –  Oct 05 '11 at 20:34
  • @Qiang Li: Try making the array `static`. – caf Oct 06 '11 at 04:17
  • @caf: I assume you meant to make the array global? what is the difference here, static vs. nonstatic? I am not aware of this issue. Thanks a lot. – Qiang Li Oct 06 '11 at 04:40
  • No, it is not necessary to make it global, you can just add the `static` storage class specifier. Objects with static storage duration exist just once for the entire invocation of the program, so they are typically not stored on the stack but in a separate memory region. This will make a difference to your case because the stack is normally quite limited in maximum size. – caf Oct 06 '11 at 05:12
  • @Jerry Coffin: Sieve of Atkin is faster only for large (>=1e9) numbers http://stackoverflow.com/questions/622/most-efficient-code-for-the-first-10000-prime-numbers/175956#175956 – jfs Oct 21 '11 at 10:51
  • Sieve of Atkin implementation for small ~32bit numbers: http://cr.yp.to/primegen.html – jfs Oct 21 '11 at 10:52

3 Answers3

2

The standard improvement to apply would be to treat each i-th bit as representing the number 2*i+1, thus representing odds only, cutting the size of the array in half. This would also entail, for each new prime p, starting the marking-off from p*p and incrementing by 2*p, to skip over evens. 2 itself is a special case. See also this question with a lot of answers.

Another strategy is to switch to the segmented sieve. That way you only need about pi(sqrt(m)) = 2*sqrt(m)/log(m) memory (m being your upper limit) set aside for the initial sequence of primes with which you'd sieve smaller fixed-sized array, sequentially representing segments of numbers. If you only need primes in some narrow far away range [m-d,m], you'd skip directly to sieving that range after all the needed primes have been gathered, as shown e.g. in this answer.

Per your specifics, to get primes up to 10^9 in value, working with one contiguous array is still possible. Using a bitarray for odds only, you'd need 10^9/16 bytes, i.e. about 60 MB of memory. Easier to work by segments; we only need 3402 primes, below 31627, to sieve each segment array below 10^9.

Community
  • 1
  • 1
Will Ness
  • 70,110
  • 9
  • 98
  • 181
0

Exactly because of the size of the array required, the Sieve of Eratosthenes becomes impractical at some point. A modified sieve is common to find larger primes, (as explained on Wikipedia).

driis
  • 161,458
  • 45
  • 265
  • 341
0

You could use gmp library. See Speed up bitstring/bit operations in Python? for the fast implementation of Sieve of Eratosthenes. It should be relatively easy to translate the provided solutions to C.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670