4

I need to do a programming assignment for the university. The task is to implement a program that returns all solutions of the equation q² + 3qp + p² = r² (q,p,r prime numbers). Subsequently, the program is to be speeded up by parallelization. Unfortunately we have to use BigInteger, so don't be surprised.

This is my class I wrote that calculates exactly this equation.

    public boolean calculateEquation() {
    //Equation: p² + 3pq + q² = r²
    if (calculatePSquare().add(calculate3TimesPQ()).add(calculateQSquare()).equals(calculateRSquare())) {
        System.out.println("p: " + p + " q: " + q + " r: " + r);
    }
    return calculatePSquare().add(calculate3TimesPQ()).add(calculateQSquare()).equals(calculateRSquare());
}
@Override
public void run() {
    calculateEquation();
}

Full code for that class: https://pastebin.com/wwrDurUT

My next step was to test the code and stop the time to see if the parallelization works later. To implement the parallelization I have looked at different threads, among others also the one linked here: What is the easiest way to parallelize a task in java?

And this was my result:

ExecutorService executorService = Executors.newFixedThreadPool(Configuration.instance.maximumNumberOfCores);
        ExecutorCompletionService executorCompletionService = new ExecutorCompletionService(executorService);

        for (BigInteger pValue : possiblePValues) {
            for (BigInteger qValue : possibleQValues) {
                for (BigInteger rValue : possibleRValues) {
                    executorCompletionService.submit(new Thirteen(pValue, qValue, rValue), null);
                }
            }
        }
        executorCompletionService.take();

Full code: https://pastebin.com/kviEnnFH

Now to what's funny, the parallelized version is only faster if the number of tasks is small. For all primes between 0-500, the parallelized version is faster. If I take all primes between 0 and 2000, the results look very different:

All primes between 0 and 100:

Not parallelized: Task took: 110ms

Parallelized: Task took: 64ms

All primes between 0 and 2000:

Not parallelized: Task took: 7797ms

Parallelized: Task took: 25799ms

Since there are so few easily understandable resources on the subject, and I honestly don't quite understand what exactly the code of mine does, I'm very surprised why it behaves as it does.

user207421
  • 305,947
  • 44
  • 307
  • 483
XaNNy0
  • 169
  • 1
  • 8
  • 1
    To exclude any surprises I would check how garbage collector behaves during the execution. Numbers are big, so it is possible that majority of time your app spends doing GC. – yegodm Feb 20 '19 at 22:25
  • Didn't think about GC at all, do you have any example for me how to check it ? – XaNNy0 Feb 20 '19 at 22:34
  • Try `-XX:+PrintGC` JVM parameter. – yegodm Feb 20 '19 at 22:36

1 Answers1

1

First of all, there is no point in implementing this p² + 3pq + q² = r² equation using a triple embedded for loops, which results in having a complexity of O(n) = n^3. This can be done using only two for loops since if we know the value of p and q, r can be calculated using the equation.

for (BigInteger p: possiblePValues) {
    for (BigInteger q: possibleQValues) {
        BigInteger r = p.pow(2).add(q.pow(2)).add(p.multiply(q).multiply(new BigInteger("3")))
        // decide if sqrt(r) is prime
     }
 }

Now if we have saved the pre-computed primes inside of a Hashmap for example, we can determine if r is prime instantly with a lookup.

About the parallalization: Your approach is wrong, since you are just spawning threads for some operations which take relatively short amount of time to complete. Probably the overhead of thread management takes longer then the operation to calculate the equation itself. I guessing that's why you get longer running period for multithreaded version. There is no golden rule on how to parallelize two embedded for loops. My approach would be to split the first loop in smaller chunks depending on how many cores you have and create some intervals. For example, if we have 100 primes and 4 threads, the first thread will take the p values having the index from 0 to 24, the second from 25 to 49 and so on. The second for loop should run from 0 to 100 in every thread. Using this approach you can compute all the possible r values.

Ervin Szilagyi
  • 14,274
  • 2
  • 25
  • 40
  • My attempt to simply brutforce all possibilities for q,p,r is based on the instructions of our lecturer, your approach makes much more sense. I will test the approach with the splitting of the outer loop. Since you didn't say anything about it, I assume that my method calls and my craft for parallelization were right in themselves or is there a more efficient way for that ? – XaNNy0 Feb 20 '19 at 22:37
  • `take()` should be called multiple times in order to get back the results, and be sure that all the tasks are finished. – Ervin Szilagyi Feb 20 '19 at 22:49