0

I want to find all the prime numbers under 10 billion. Which is 5 times as big as int can hold (which is the limitation of arrays regardless of type). Attempting to allocate over 1.2billion at a time results in out of heap space error. I tried using List instead of a boolean array, but the set element method for arrayLists only indexes up to int. What bugs me, is that pretty quickly into the sieve there are less than an integer number of elements not crossed off. One method that should work, would be to create a partition of 10 arrays and smash them together... but that would be ugly. Let me know if you have any suggestions of an elegant way to solve this. (Other than using Python lol). I already have an n^2/2 brute force implementation, but that takes a long time to run so really I want to solve this as big O fast as possible. My Sieve implementation that works up to 1.2Billion is as follows:

public class SieveEratosthenes {
private boolean[] nums;
public static void main(String[] args) {
    int n = 1000000;
    SieveEratosthenes s = new SieveEratosthenes(n);
    for(int i=0;i<s.nums.length;i++){
        if(s.nums[i]){
            System.out.println(i);
        }
    }
}

public SieveEratosthenes(int max){
    sieve(max);
}

private boolean[] sieve(int max){
    nums = new boolean[max+1];
    initFlags();
    for(int i=2;i*i<max;i++){
        for(int j=i*i;j<=max;j+=i){//cross off non-primes
            nums[j]=false;
        }
    }
    return nums;
}
private void initFlags(){
    if(nums != null&&nums.length>1){
        nums[0]=false;
        nums[1]=false;
        nums[2]=true;
    }
    for(int i=3;i<nums.length;i++){
        nums[i]=true;
    }
}

public List<Long> sieveToList(){
    List<Long> sieveList = new ArrayList();
    for(int i=0;i<nums.length;i++){
        if(nums[i]){
            sieveList.add((long)i);
        }
    }
    return sieveList;
}
woodlumhoodlum
  • 462
  • 3
  • 10
  • 24
  • 2
    Use `BigInteger` class of Java, that would be a lot less pain! – Am_I_Helpful Sep 04 '15 at 04:51
  • You could put only prime numbers in HashSet. This would save a lot of memory. – schrieveslaach Sep 04 '15 at 04:55
  • 1
    @Am_I_Helpful The asker is not using `int`s as the elements a list of primes, but a `boolean[]` where the _n_ th element describes the primality of _n_. The problem is that array indices are `int` values, so only the primality of numbers up to `MAX_INT` can be described. Unless there is a list structure indexed by `BigInteger` values, using `BigInteger` cannot replace `int` in this implementation. – Alden Sep 04 '15 at 05:00
  • @Alden - That's what my suggestion is, remove array practice, bring a List or something and then implement the same program. That would be a much less pain. I suggested him to change his way of implementation! – Am_I_Helpful Sep 04 '15 at 05:12
  • 1
    http://stackoverflow.com/questions/8804435/alternative-to-java-bitset-with-array-like-performance maybe the `OpenBitSet` (see this [answer](http://stackoverflow.com/a/32196710/180100)). Also note that you could leverage the fact that 2^n is never a prime –  Sep 04 '15 at 05:23
  • Good suggestions @Schrieveslaach and RC. I'll likely try the HashSet, being as I'm studying for an interview it would be a more common data structure to practice with. – woodlumhoodlum Sep 04 '15 at 05:38
  • As for: @Am_I_Helpful, for the reasons Alden and I mentioned- there are still issues with your approach. When I tried redoing the program using a List I ran into the problem of not being able to use the "set" method to cross off non-primes due to the method's argument requiring an int. My non-sieve approach(not shown) matches what you are thinking of. You are welcome to have a go at it though! – woodlumhoodlum Sep 04 '15 at 05:38

3 Answers3

2

Here is one approach that you can use:

  • Use sieve for 10^7 ints or whatever size is suitable for you.
  • Then, for every implementation of sieve, at the end, save all computed primes in any data-structure you are comfortable with (ArrayList would do).
  • Now, do this for 1000 times, using loop (of course) and every time, your sieve would compute primes in next 10^7 range. So, on 1st iteration, all primes from 0-10^7 would be computed. Then, from 10^7+1 to 2*10^7 and so on.

PS: If you want the code, I'll do it for you but I recommend you to try it once. I may be wrong on this but I think this approach is what they call segmented sieve.

vish4071
  • 5,135
  • 4
  • 35
  • 65
  • Yeah I had definitely thought about splitting it up into segments, but I was hoping there would be a way with a single data structure. In this approach I'll be sure to not store them all at once. I think once one of the segments is computed I'll offload the boolean[] into an arrayList while converting from boolean position to the value of the prime. That way only the primes are stored in the arrayList, and there will be plenty of room as only 5.4% of #'s below 10billion are prime. I would then use the arrayList to start crossing off the next segment, and loop as you said. – woodlumhoodlum Sep 04 '15 at 06:45
  • 1
    Yes exactly. If you have any problem with that implementation, post it here and I think I'll be of help. – vish4071 Sep 04 '15 at 06:46
  • 1
    Hey, where did you find that 5.4% of #'s below 10^10 are prime? A good approximation is given by (N / ln(N)) primes in range 0 to N. (ln(N) is log base e of N), which gives 4.34%. – vish4071 Sep 04 '15 at 06:49
  • 10^8, actually. When I first solved this brute force I wrote down how many primes there were in all the 10^x's below 10^8 so that I could estimate how many hours it would take to run lol! But yeah think about how many primes there are in 10^2 and see how it gradually decreases. – woodlumhoodlum Sep 04 '15 at 06:52
  • yeah I think 4.34 is spot on for 10^10 if 10^8 is 5.4. – woodlumhoodlum Sep 04 '15 at 06:53
0

You should probably move away from using arrays for this. As you say they are not well suited to very large sets. A reasonable approximation of the algorithm is to 'cross out' as you examine each prime by testing it it's a multiple of any previous primes. I haven't analysed the performance of this.

class Sieve {
    private long current = 2;
    private final List<Long> primes = new ArrayList<>();

    public long nextPrime() {
        while (primes.stream().anyMatch(p -> current % p == 0))
            current++;
        primes.add(current);
        return current;
    }
}
sprinter
  • 27,148
  • 6
  • 47
  • 78
  • Just tested yours. I'm sorry to say that although that is one beautiful piece of code, both my sieve and brute force methods respond nearly instantaneously to compute primes under 100000. But your method takes 3s. So I'm not sure if it's the parallelStream overhead or the "anyMatch" portion that takes a long time. I haven't done much in java 8, and it's great to see a beautiful looking lambda approach. I think it'll come in handy on future problems. Thanks again! – woodlumhoodlum Sep 04 '15 at 06:13
  • 1
    @woodlumhoodlum I found something interesting in looking at the performance of this. On my (multi-core) mac it runs 4 times faster with `stream` than with `parallelStream`. I'm going to have to investigate that. I guess we shouldn't be too surprised that it's slower than your array based solution given streams and collections do have overheads. – sprinter Sep 04 '15 at 06:17
  • Changing to stream() on mine made it run in 2s consistently. A second saved, is a second earned. I have a mid-range i5. – woodlumhoodlum Sep 04 '15 at 06:48
0

If memory isn't an issue, continue using an array because it is much faster. If memory becomes an issue, I would suggest looking into BitSets.

While Java data structures are limited (as far as I know) to the maximum int size at around 2 billion, you could create your own data structure. A very simple solution would be to create a class that splits up your requested size into several arrays or bitsets given the max int length, and then automatically accesses them given by a long index input that you enter. I hope that makes sense. Let me know if you have any questions!

Ofek Gila
  • 693
  • 1
  • 10
  • 21
  • What type of arguments do the `java.util.BitSet` methods take? – greybeard Nov 23 '15 at 07:38
  • @greybeard, the odcumentation is here: https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html It uses boolean arguments (integer for index) – Ofek Gila Nov 23 '15 at 22:37