7

Currently I have this prime generator that is limited to n < 2^32-1. I'm not entirely sure how I could expand the limit further, given the limit of elements in an array.

Sieve:

public class Main {

public static void main(String args[]){
    long N = 2000000000;

    // initially assume all integers are prime

    boolean[] isPrime = new boolean[N + 1];
    for (int i = 2; i <= N; i++) {
        isPrime[i] = true;
    }

    // mark non-primes <= N using Sieve of Eratosthenes
    for (int i = 2; i*i <= N; i++) {

        // if i is prime, then mark multiples of i as nonprime
        // suffices to consider mutiples i, i+1, ..., N/i
        if (isPrime[i]) {
            for (int j = i; i*j <= N; j++) {
                isPrime[i*j] = false;
            }
        }
    }
}
}

How could I modify this to go past n = 2^32-1?

Arman
  • 655
  • 2
  • 7
  • 23
  • 2
    Create another array? :) – Nir Alfasi Sep 21 '15 at 04:33
  • @alfasin Is there a better way to do this? Like create more space programmatically, maybe in a 2D Array? – Arman Sep 21 '15 at 04:41
  • What is the use case? I've used sieve a lot of times, and have never really needed one of size > 10^7 – vish4071 Sep 21 '15 at 05:31
  • @vish4071 It's just a project I've been working on. – Arman Sep 21 '15 at 05:38
  • Project is fine. I meant, bottleneck of implementation of sieve is that it takes a lot of memory. Most (common) languages don't provide that much. Even using `boolean` array to calculate upto 2^32 primes is good, but it sure would be taking a lot of time. (Complexity is O(n loglog n), which is of the most optimized implementation, most of the implementations I've seen have complexity O(n log n)) – vish4071 Sep 21 '15 at 05:47
  • @vish4071 What would your recommendation be, for going past 2^32? – Arman Sep 21 '15 at 05:49
  • So, if you want primes >10^9, may be you don't need all primes for a particular test case (a range maybe works for you). What is the absolute upper limit of primes that you want? – vish4071 Sep 21 '15 at 05:49
  • For example, if you want primes in range (say) 5*10^9 to 5.001*10^9 (see, range 10^6) for any particular test case, you don't even need to calculate all primes (from 0 to 5.001*10^9) in the first place. First, calculate primes upto 10^5, and then use what is known as `segmented sieve` – vish4071 Sep 21 '15 at 05:52
  • I'm not entirely sure what my upper bound should be, probably at most 10 billion. So, should I calculate these in chunks? – Arman Sep 21 '15 at 05:56
  • I'd recommend that. No ds actually allows that much of space. 10 billion (even boolean) means 10^10 bits = 1.125*10^9 bytes ~ 1GB. You actually require 1GB of RAM for your code to run. This is really hard on hardware and even will degrade the performance of your application. – vish4071 Sep 21 '15 at 06:13
  • For every situation that I have faced, `segmented sieve` has worked for me. If it does not solve your use case, you can tell me in detail. May be I'll be of help. – vish4071 Sep 21 '15 at 06:16
  • Use a segmented sieve as described [here](http://stackoverflow.com/a/10249801/448810). – user448810 Sep 21 '15 at 12:22

3 Answers3

4

You may use an array of BitSet objects to represent long bit-set. Here's the complete example:

public class Main {
    private static class LongBitSet {
        // max value stored in single BitSet
        private static final int BITSET_SIZE = 1 << 30;

        BitSet[] bitsets;

        public LongBitSet(long limit) {
            bitsets = new BitSet[(int) (limit/BITSET_SIZE+1)];
            // set all bits by default
            for(int i=0; i<bitsets.length; i++) {
                bitsets[i] = new BitSet();
                int max = (int) (i == bitsets.length-1 ?
                          limit % BITSET_SIZE : BITSET_SIZE);
                bitsets[i].set(0, max);
            }
        }

        // clear specified bit
        public void clear(long pos) {
            bitsets[(int) (pos / BITSET_SIZE)].clear((int) (pos % BITSET_SIZE));
        }

        // get the value of the specified bit
        public boolean get(long pos) {
            return bitsets[(int) (pos / BITSET_SIZE)].get((int) (pos % BITSET_SIZE));
        }

        // get the number of set bits
        public long cardinality() {
            long cardinality = 0;
            for(BitSet bs : bitsets) {
                cardinality += bs.cardinality();
            }
            return cardinality;
        }
    }

    public static void main(String args[]) {
        long N = 4000000000L;

        // initially assume all integers are prime

        LongBitSet bs = new LongBitSet(N+1);
        // clear 0 and 1: non-primes
        bs.clear(0);
        bs.clear(1);

        // mark non-primes <= N using Sieve of Eratosthenes
        for (long i = 2; i * i <= N; i++) {
            if (bs.get(i)) {
                for (long j = i; i * j <= N; j++) {
                    bs.clear(i * j);
                }
            }
        }
        System.out.println(bs.cardinality());
    }
}

This program for N = 4_000_000_000L takes around 512Mb of memory, works couple of minutes and prints 189961812 which is correct number of primes below 4 billions according to Wolfram Alpha. If you have enough RAM, you may try setting even bigger N.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
2

You can segment the sieve: instead of allocating a single, gigantic array you allocate many small arrays. If you want to find the primes up to 10^10 you might use arrays (or better yet, BitSets) of size 10^6 or so. Then you run the sieve 10^4 times. Each time you run a new segment you need to find out where to start each prime in the sieve, but that's not really too hard.

In addition to allowing much smaller memory use this keeps more of the memory in the cache, so it's often significantly faster.

Charles
  • 11,269
  • 13
  • 67
  • 105
1

I see options:

  1. pack 16 numbers / 1 BYTE

    • remember odd numbers only per each bit
    • use unsigned variable to avoid sign bit waste
  2. use more than 1 table

    • but in 32bit app you are limited by OS capabilities
    • usually to 1/2/4 GB of usable memory
    • such big table usually does not fit inside CACHE so it is not very fast anyway
  3. you can use more approaches at once

    • I combine periodic sieves with found prime list binary search
    • If you choose the sizes right you can even improve performance
    • by better using platform Caching properties
    • see Prime numbers by Eratosthenes quicker sequential than concurrently?
    • the idea is to use sieves to test small divisors
    • and only then check the presence inside prime list
    • it is not requiring that much memory
    • and is pretty fast
  4. to spare memory you can combine 16/32/64 bit variables

    • use small bit width while you can
    • so have prime list divided to 3 groups: small/medium/big
    • if you want also bigints add them as 4th list
Community
  • 1
  • 1
Spektre
  • 49,595
  • 11
  • 110
  • 380