1

Problem 10 from Project Euler:

The program runs for smaller numbers and slows to a crawl in the hundred thousands. At 2 million, an answer fails to show up even though the program seems like it is still running.

I'm trying to implement the Sieve of Eratosthenes. It is supposed to be very fast. What's wrong with my approach?

import java.util.ArrayList;

public class p010
{
  /**
   * The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17
   * Find the sum of all the primes below two million.
   * @param args
   */
  public static void main(String[] args)
  {
    ArrayList<Integer> primes = new ArrayList<Integer>();
    int upper = 2000000;
    for (int i = 2; i < upper; i++)
    {
      primes.add(i);
    }
    int sum = 0;
    for (int i = 0; i < primes.size(); i++)
    {
      if (isPrime(primes.get(i)))
      {
        for (int k = 2; k*primes.get(i) < upper; k++)
        {
          if (primes.contains(k*primes.get(i)))
          {
            primes.remove(primes.indexOf(k*primes.get(i)));
          }
        }
      }
    }
    for (int i = 0; i < primes.size(); i++)
    {
      sum += primes.get(i);
    }
    System.out.println(sum);
  }

  public static boolean isPrime(int number)
  {
    boolean returnVal = true;
    for (int i = 2; i <= Math.sqrt(number); i ++)
    {
      if (number % i == 0)
      {
        returnVal = false;
      }
    }
    return returnVal;
  }

}
Will Ness
  • 70,110
  • 9
  • 98
  • 181
Chris Zhang
  • 968
  • 7
  • 16
  • Maybe you are running out of RAM? – Babblo Jun 01 '13 at 23:33
  • 1
    That's a bad implementation of the sieve of eratosthenes. Refer to the [wikipedia link](http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes) and check the gif example. – Luiggi Mendoza Jun 01 '13 at 23:33
  • 2
    In the `isPrime()` method, you can just check all the prime dividers (up to `sqrt(number)`). If a number is not a prime, it will be divisible by a prime (it cannot by divided by 6 if it cannot be divided by 2 or 3). – SJuan76 Jun 01 '13 at 23:37
  • This is not an inefficient implementation of the sieve. It is just a brute force prime tester. – Peter Wooster Jun 01 '13 at 23:39
  • Instead of adding all numbers to a list (huge memory/time consumption) and deleting them later (another huge loss), just add the prime numbers when you find it. You do not need a list to tell you that `43` is after `42`, don't you? – SJuan76 Jun 01 '13 at 23:40
  • 1
    @SJuan76: This is what {he,she} is trying to do, but the loop is badly broken: should return false on the first successful test. Might be 50% faster if just testing for odd divisors. Might be a lot faster when storing the primes in bitmaps (long[]) – Gyro Gearless Jun 01 '13 at 23:41
  • @GyroGearless He is not checking just primes, but all of the integers between `2` and `sqrt(number)`. So, if he gets that `number` is not divisible by `2` or `3`, he will still check the division by `6`. Anyway I missed the fact that he does not return on false, well spotted. – SJuan76 Jun 01 '13 at 23:44
  • @PeterWooster that answer is not a duplicate. Fr one thing it asks about C++, not Java; 2nd, it is plain trial division while this one *attempts* to implement a sieve of Eratosthenes. **Not** a duplicate. :) – Will Ness Jun 04 '13 at 05:50
  • @PeterWooster it is also wrong to say that this is "just a brute force prime tester". Though it uses `isPrime` which *is* a sub-optimal trial division method, it then tries to *remove* the prime's multiples *without* testing them, which reminds of a sieve. – Will Ness Jun 04 '13 at 06:39
  • **this closure as "duplicate" is scandalous.** The *faux-"duplicate"* is C++ trial-division code; this here is an attempt (in Java) to implement **the sieve of Eratosthenes**, implicitly asking why _this_ _specific_ _code_ is not a proper implementation of the sieve. Of course the *faux-"duplicate"* does **not** answer __this__ question (and contains a buggy code, of **trial division**, to boot). A question is more than just its title! – Will Ness Jun 15 '13 at 13:52

5 Answers5

5

You appear to be trying to implement the Sieve of Eratosthenes which should perform better that O(N^2) (In fact, Wikipedia says it is O(N log(log N)) ...).

The fundamental problem is your choice of data structure. You've chosen to represent the set of remaining prime candidates as an ArrayList of primes. This means that your test to see if a number is still in the set takes O(N) comparisons ... where N is the number of remaining primes. Then you are using ArrayList.remove(int) to remove the non-primes ... which is O(N) also.

That all adds up to making your Sieve implementation worse than O(N^2).

The solution is to replace the ArrayList<Integer> with an boolean[] where the positions (indexes) in the boolean array represent the numbers, and the value of the boolean says whether the number is prime / possibly prime, or not prime.

(There were other problems too that I didn't notice ... see the other answers.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Thank you kind sir and to all those who helped answer. I did end up using a boolean[] array that cut the running time down to a paltry 2-3 seconds on my machine. 142913828922 was my correct answer. – Chris Zhang Jun 02 '13 at 01:35
  • I'd say that the *fundamental* problem is that OP tries to *remove* composites from the candidates list (instead of just marking them), thus destroying the possibility of direct addressing. Even on an `int[]`, after the *first* removal the direct addressing is impossible. -- Of course with `boolean[]`, there's no removing anything, the Conflation of index and value leads to Great Simplification and all the problems disappear as if by themselves. :) But if one tries to *remove* composites in the first place, last thing they'd use is `boolean[]`. :) – Will Ness Jun 04 '13 at 06:02
  • and since [`ArrayList` implements `RandomAccess`](http://docs.oracle.com/javase/6/docs/api/java/util/ArrayList.html), **it *can* be used** to implement a proper sieve, when used correctly (i.e. no `remove`, only `set`). Just the memory consumption will be bigger than it has to be, but for 2 mln is not a big problem. – Will Ness Jun 04 '13 at 06:23
  • @WillNess - Feel free to write your own Answer :-) – Stephen C Jun 04 '13 at 07:14
3

There are a few issues here. First, lets talk about the algorithm. Your isPrime method is actually the very thing that the sieve is designed to avoid. When you get to a number in the sieve, you already know it's prime, you don't need to test it. If it weren't prime, it would already have been eliminated as a factor of a lower number.

So, point 1:

  • You can eliminate the isPrime method altogether. It should never return false.

Then, there are implementation issues. primes.contains and primes.remove are problems. They run in linear time on an ArrayList, because they require checking each element or rewriting a large portion of the backing array.

Point 2:

  • Either mark values in place (use boolean[], or use some other more appropriate data structure.)

I typically use something like boolean primes = new boolean[upper+1], and define n to be included if !(primes[n]). (I just ignore elements 0 and 1 so I don't have to subtract indices.) To "remove" an element, I set it to true. You could also use something like TreeSet<Integer>, I suppose. Using boolean[], the method is near-instantaneous.

Point 3:

  • sum needs to be a long. The answer (roughly 1.429e11) is larger than the maximum value of an integer (2^31-1)

I can post working code if you like, but here's a test output, without spoilers:

public static void main(String[] args) {
    long value;
    long start;
    long finish;

    start = System.nanoTime();
    value = arrayMethod(2000000);
    finish = System.nanoTime();
    System.out.printf("Value: %.3e, time: %4d ms\n", (double)value, (finish-start)/1000000);

    start = System.nanoTime();
    value = treeMethod(2000000);
    finish = System.nanoTime();
    System.out.printf("Value: %.3e, time: %4d ms\n", (double)value, (finish-start)/1000000);
}

output:

Using boolean[]
    Value: 1.429e+11, time:   17 ms
Using TreeSet<Integer>
    Value: 1.429e+11, time: 4869 ms

Edit: Since spoilers are posted, here's my code:

public static long arrayMethod(int upper) {
    boolean[] primes = new boolean[upper+1]; 
    long sum = 0;
    for (int i = 2; i <=upper; i++) {
        if (!primes[i]) {
            sum += i;
            for (int k = 2*i; k <= upper; k+=i) {
                primes[k] = true;
            }
        }
    }
    return sum;
}

public static long treeMethod(int upper) {
    TreeSet<Integer> primes = new TreeSet<Integer>();
    for (int i = 2; i <= upper; i++) {
        primes.add(i);
    }
    long sum = 0;
    for (Integer i = 2; i != null; i=primes.higher(i)) {
        sum += i;
        for (int k = 2*i; k <= upper; k+=i) {
            primes.remove(k);
        }
    }
    return sum;
}
maybeWeCouldStealAVan
  • 15,492
  • 2
  • 23
  • 32
  • He needs the `isPrime` method. Just do not apply them to the members of the `prime` list, but to the new others, the successive integers of the main loop (`i`). And for it, a list is more efficient (just iterate it to find if any of the primes is a divisor of the `i` being checked). – SJuan76 Jun 02 '13 at 00:33
  • 1
    @SJuan76 Nope, he doesn't. Try his code as is (though, with a smaller value for upper), then replace the code in `isPrime` with `return true;`. Your answer will not change. – maybeWeCouldStealAVan Jun 02 '13 at 00:44
  • you should never `remove` anything, in a sieve. Your timings prove it. So, no place for `either` in your 2nd point. :) Nevertheless, your answer is the most to-the-point, IMO. – Will Ness Jun 04 '13 at 06:56
  • I agree -- the tree structure was more of a toy. – maybeWeCouldStealAVan Jun 04 '13 at 17:21
  • I've added emphasis to the very important point you make there. :) – Will Ness Jun 15 '13 at 13:33
  • Perhaps, but you make me look like I'm shouting. :-) I meant to rollback your edits and replace them with a simple *emphasis*, but I didn't realize that rollbacks count as edits, and I'm already on the edge of community wiki status, so I'll leave it be. Thanks for your comments, but please make no more edits :-) – maybeWeCouldStealAVan Jun 15 '13 at 18:18
  • Sorry that it wasn't to your liking. :) – Will Ness Jun 15 '13 at 22:31
0

Two things:

Your code is hard to follow. You have a list called "primes", that contains non prime numbers!

Also, you should strongly consider whether or not an array list is appropriate. In this case, a LinkedList would be much more efficient.

Why is this? An array list must constantly resize an array by: asking for new memory to create an array, then copying the old memory over in the newly created array. A Linked list would just resize the memory by changing a pointer. This is a lot quicker! However, I do not think that by making this change you can salvage your algorithm.

You should use an array list if you need to access the items non-sequentially, here, (with a suitable algorithm) you need to access the items sequentially.

Also, your algorithm is slow.Take the advice of SJuan76 (or gyrogearless), thanks sjuan76

  • GyroGearless advice is a good one too, `isPrime()` method should return as fast as he finds that the number is a prime. No sense checking all the numbers if you find that the number is even at the first iteration!. – SJuan76 Jun 01 '13 at 23:58
  • `LinkedList` would speed up `remove`, but not `contains`. – maybeWeCouldStealAVan Jun 02 '13 at 00:13
  • Contains speed would be the same with a LinkedList or ArrayList? – je_suis_beau Jun 02 '13 at 00:19
  • It's normal to have non-primes in a sieve called `primes`, as the non-primes are removed in the process. [Even I do it that way](http://stackoverflow.com/a/1043247/49246). – starblue Jun 02 '13 at 21:27
  • @starblue if you remove *anything* then it's not a sieve. But [you do not remove, you use `primes.clear(j);`](http://stackoverflow.com/a/1043247/49246). This *marks* the composites, as it should. :) Words can mislead; trying to *remove* composites is what got the OP into troubles in the first place. – Will Ness Jun 04 '13 at 06:11
  • the **key** to the sieve's efficiency is the direct (i.e. **non-sequential**) access to and/or direct generation of multiples, that's the whole point to the sieve *of Eratosthenes*. The "advice of SJuan76" would just make it a trial division method, not a sieve. – Will Ness Jun 04 '13 at 09:42
0

Your program is not the Sieve of Eratosthenes; the modulo operator gives it away. Your program will be O(n^2), where a proper Sieve of Eratosthenes is O(n log log n), which is essentially n. Here's my version; I'll leave it to you to translate to Java with appropriate numeric datatypes:

function sumPrimes(n)
    sum := 0
    sieve := makeArray(2..n, True)
    for p from 2 to n step 1
        if sieve[p]
            sum := sum + p
            for i from p * p to n step p
                sieve[i] := False
    return sum

If you're interested in programming with prime numbers, I modestly recommend this essay at my blog.

user448810
  • 17,381
  • 4
  • 34
  • 59
0

The key to the efficiency of classic implementation of the sieve of Eratosthenes on modern CPUs is the direct (i.e. non-sequential) memory access. Fortunately, ArrayList<E> does implement RandomAccess.

Another key to the sieve's efficiency is its conflation of index and value, just like in integer sorting. Actually removing any number from the sequence destroys this ability to directly address without any computations. We must mark, not remove, any composite as we find them, so any numbers greater than it will remain in their places in the sequence.

ArrayList<Integer> can be used for that (except taking more memory than is strictly necessary, but for 2 million this is inconsequential).

So your code with a minimal edit fix (also changing sum to be long as others point out too), becomes

import java.util.ArrayList;

public class Main
{
  /**
   * The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17
   * Find the sum of all the primes below two million.
   * @param args
   */
  public static void main(String[] args)
  {
    ArrayList<Integer> primes = new ArrayList<Integer>();
    int upper = 5000;
    primes.ensureCapacity(upper);
    for (int i = 0; i < upper; i++) {
      primes.add(i);
    }
    long sum = 0;
    for (int i = 2; i <= upper / i; i++) {
      if ( primes.get(i) > 0 ) {
        for (int k = i*i; k < upper ; k+=i) {
          primes.set(k, 0);
        }
      }
    }
    for (int i = 2; i < upper; i++) {
      sum += primes.get(i);
    }
    System.out.println(sum);
  }
}

Finds the result for 2000000 in half a second on Ideone. The projected run time for your original code there: between 10 and 400 hours (!).

To find rough estimates for the run time when faced with a slow code, you should always try to find out its empirical orders of growth: run it for some small size n1, then a bigger size n2, record the run times t1 and t2. If t ~ n^a, then a = log(t2/t1) / log(n2/n1).

For your original code the empirical orders of growth measured on 10k .. 20k .. 40k range of upper limit value N, are ~ N^1.7 .. N^1.9 .. N^2.1. For the fixed code it's faster than ~ N (in fact, it's ~ N^0.9 in the tested range 0.5 mln .. 1 mln .. 2 mln). The theoretical complexity is O(N log (log N)).

Will Ness
  • 70,110
  • 9
  • 98
  • 181