11

I was trying to get all primes before 600851475143. I was using Sieve of Eratosthenes for this. This requires me to create a boolean array of that huge size. Bad idea, you can run out of memory. Any other way. I tried using a string, using each index with values 0 & 1 to represent true or false. but indexOf method too returns int.

Next i am using 2d array for my problem. Any other better way to store such a huge array?

adrianp
  • 2,491
  • 5
  • 26
  • 44
S Kr
  • 1,831
  • 2
  • 25
  • 50
  • 19
    "I was trying to get all primes before 600851475143." That's totally the wrong approach for that Project Euler problem. – Daniel Fischer Apr 15 '13 at 12:10
  • 7
    I would suggest that if your solution requires you to make 600 BILLION array entries, then you need to take a new approach. – Patashu Apr 15 '13 at 12:11
  • 5
    @Ashok Vector being backed by an array, I don't see how that will make a difference. – assylias Apr 15 '13 at 12:12
  • 2
    @Ashok Vectors are backed by arrays and are synchronized. Wont help. – Deepak Bala Apr 15 '13 at 12:13
  • +1 : vectors can't be used. – S Kr Apr 16 '13 at 07:58
  • If this is to solve the euler problem, it might be easier to just find the factors first and THEN figure out the primes rather than vice versa. You only have to find the factors between 2 and sqrt(600851475143), which makes this much more doable – xyz Apr 19 '13 at 07:35

5 Answers5

7

The memory requirement for 600851475143 booleans is at best 70Gb. This isn't feasible. You need to either use compression as suggested by Stephan, or find a different algorithm for calculating the primes.

Alan
  • 3,307
  • 1
  • 19
  • 22
  • 7
    It is feasible, but probably not realistic! – assylias Apr 15 '13 at 12:19
  • Well, I should probably have clarify that while possible, it is not feasible (aka practical) with anything less than a seriously highend computer (or possibly supercomputer). – Alan Apr 15 '13 at 12:21
  • I don't want to use a library, so even though EWAHCompressedBitmap is very promising, i will use BitSets of size 32 mb each. And add lazy loading to it. Looking for better option. The tradition way for this problem is too slow, but does the job. – S Kr Apr 16 '13 at 09:42
6

I had a similar problem and i used a bit set (basically set 1 or 0 for the desired offset in order) and i recomend using EWAHCompressedBitmap it will also compress your bit set

EDIT

As Alan said the BitSet will occupy 70GB of memory but you can do another thing : to have multiple BitSets (consecutive ones so that you can calculate the absolute position) and load in memory just the BitSet that you need in that moment something like a lazy load, in this case you will have control of the memory used.

Stephan
  • 8,000
  • 3
  • 36
  • 42
2

Its not really practical to remember for each number if it was a prime or not for such a large amount (the sieve is a very slow approach for large numbers in general).

From this link you get an idea how many primes there are to be expected smaller than X. For your 600 billion range you can expect roughly 20 billion primes to exist within that range. Storing them as long[] would require about 160GB of memory... that notably more than the suggested 70GB for storing a single bit for each number, half if you exclude even numbers (2 is the only even prime).

For a desktop computer 35GB in memory may be a bit much, but a good workstation can have that much RAM. I would try a two-dimensional array with bit shifting/masking.

I still would expect your sieve code to run a considerable amount of time (something from days to years). I suggest you investigate more advanced prime detection methods than sieve.

Durandal
  • 19,919
  • 4
  • 36
  • 70
2

You could use HotSpot's internal sun.misc.Unsafe API to allocate a bigger array. I wrote a blogpost how to simulate an array with it However, it's not an official Java API, so it qualifies as a hack.

Korbi
  • 1,438
  • 4
  • 15
  • 22
0

Use BitSet. You can then set bit any index element. 600851475143 is 2^39 thus taking only 39 bits internally (actually in reality it will occupy 64 bits as it uses long).

You can infact move upto 2^63 which is massive for most purposes

Marco Forberg
  • 2,634
  • 5
  • 22
  • 33
Jatin
  • 31,116
  • 15
  • 98
  • 163