0

I am doing a course in coursera about parallel programming offered by Rice University. In the assignment, we are supposed to recursively calculate the sum of reciprocals of elements of an array. To parallelise this, we used a fork/join pool and recursively calculated the required sum of the left and right subarrays and joined them back. But the execution time I am getting is more than when I do it sequentially. The following the console output in eclipse -

Time for sequential: 0.136 , Value: 7.485471 Time for parallel: 6.674 , Value: 7.485471

The code is as follows -

package edu.coursera.parallel;

import java.util.Random;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveAction;

/**
 * Class wrapping methods for implementing reciprocal array sum in parallel.
 */
public final class ReciprocalArraySum
{

    /**
     * Default constructor.
     */
    private ReciprocalArraySum()
    {
    }

    /**
     * Sequentially compute the sum of the reciprocal values for a given array.
     *
     * @param input Input array
     * @return The sum of the reciprocals of the array input
     */
    protected static double seqArraySum(final double[] input)
    {
        double sum = 0;

        // Compute sum of reciprocals of array elements
        for (int i = 0; i < input.length; i++)
        {
            sum += (1 / input[i]);
        }

        return sum;
    }

    /**
     * Computes the size of each chunk, given the number of chunks to create
     * across a given number of elements.
     *
     * @param nChunks The number of chunks to create
     * @param nElements The number of elements to chunk across
     * @return The default chunk size
     */
    private static int getChunkSize(final int nChunks, final int nElements)
    {
        // Integer ceil
        return (nElements + nChunks - 1) / nChunks;
    }

    /**
     * Computes the inclusive element index that the provided chunk starts at,
     * given there are a certain number of chunks.
     *
     * @param chunk The chunk to compute the start of
     * @param nChunks The number of chunks created
     * @param nElements The number of elements to chunk across
     * @return The inclusive index that this chunk starts at in the set of
     *         nElements
     */
    private static int getChunkStartInclusive(final int chunk, final int nChunks, final int nElements)
    {
        final int chunkSize = getChunkSize(nChunks, nElements);
        return chunk * chunkSize;
    }

    /**
     * Computes the exclusive element index that the provided chunk ends at,
     * given there are a certain number of chunks.
     *
     * @param chunk The chunk to compute the end of
     * @param nChunks The number of chunks created
     * @param nElements The number of elements to chunk across
     * @return The exclusive end index for this chunk
     */
    private static int getChunkEndExclusive(final int chunk, final int nChunks, final int nElements)
    {
        final int chunkSize = getChunkSize(nChunks, nElements);
        final int end = (chunk + 1) * chunkSize;
        if (end > nElements)
        {
            return nElements;
        }
        else
        {
            return end;
        }
    }

    /**
     * This class stub can be filled in to implement the body of each task
     * created to perform reciprocal array sum in parallel.
     */
    private static class ReciprocalArraySumTask extends RecursiveAction
    {
        /**
         * Starting index for traversal done by this task.
         */
        private final int startIndexInclusive;
        /**
         * Ending index for traversal done by this task.
         */
        private final int endIndexExclusive;
        /**
         * Input array to reciprocal sum.
         */
        private final double[] input;
        /**
         * Intermediate value produced by this task.
         */
        private double value;

        /**
         * Constructor.
         * @param setStartIndexInclusive Set the starting index to begin
         *        parallel traversal at.
         * @param setEndIndexExclusive Set ending index for parallel traversal.
         * @param setInput Input values
         */
        ReciprocalArraySumTask(final int setStartIndexInclusive, final int setEndIndexExclusive, final double[] setInput)
        {
            this.startIndexInclusive = setStartIndexInclusive;
            this.endIndexExclusive = setEndIndexExclusive;
            this.input = setInput;
        }

        /**
         * Getter for the value produced by this task.
         * @return Value produced by this task
         */
        public double getValue()
        {
            return value;
        }

        @Override
        protected void compute()
        {
            // TODO
            if((endIndexExclusive - startIndexInclusive) <= (input.length/5))
            {
                for(int i=startIndexInclusive;i<endIndexExclusive;i++)
                {
                    value+=(1/input[i]);
                }
            }
            else
            {
                int mid = ((startIndexInclusive+endIndexExclusive)/2);
                ReciprocalArraySumTask l = new ReciprocalArraySumTask(startIndexInclusive, mid, input);
                ReciprocalArraySumTask r = new ReciprocalArraySumTask(mid, endIndexExclusive, input);
                l.fork();
                r.fork();
                l.join();
                r.join();
                value = l.getValue()+r.getValue();
            }
        }
    }

    /**
     * TODO: Modify this method to compute the same reciprocal sum as
     * seqArraySum, but use two tasks running in parallel under the Java Fork
     * Join framework. You may assume that the length of the input array is
     * evenly divisible by 2.
     *
     * @param input Input array
     * @return The sum of the reciprocals of the array input
     */
    protected static double parArraySum(final double[] input)
    {
        assert input.length % 2 == 0;
        double sum = 0.0;
        // Compute sum of reciprocals of array elements
        /*for(int i=0;i<input.length;i++)
        {
            sum+=(1/input[i]);
        }*/
        ReciprocalArraySumTask t = new ReciprocalArraySumTask(0,input.length,input);
        t.compute();
        sum = t.getValue();
        return sum;
    }

    /**
     * TODO: Extend the work you did to implement parArraySum to use a set
     * number of tasks to compute the reciprocal array sum. You may find the
     * above utilities getChunkStartInclusive and getChunkEndExclusive helpful
     * in computing the range of element indices that belong to each chunk.
     *
     * @param input Input array
     * @param numTasks The number of tasks to create
     * @return The sum of the reciprocals of the array input
     */
    protected static double parManyTaskArraySum(final double[] input, final int numTasks)
    {
        double sum = 0;

        // Compute sum of reciprocals of array elements
        for (int i = 0; i < input.length; i++)
        {
            sum += 1 / input[i];
        }

        return sum;
    }

    public static void main(String[] args)
    {
        ReciprocalArraySum r = new ReciprocalArraySum();
        double[] imp = new double[1000];
        Random rGen = new Random();
        for(int i=0;i<1000;i++)
        {
            imp[i] = (i+1);
        }
        long startTimeSeq = System.nanoTime();
        double value = r.seqArraySum(imp);
        long endTimeSeq = System.nanoTime();
        System.out.printf("Time for sequential: %4.3f , Value: %f\n",(endTimeSeq-startTimeSeq)/1e6,value);

        long startTimePar = System.nanoTime();
        value = r.parArraySum(imp);
        long endTimePar = System.nanoTime();
        System.out.printf("Time for parallel: %4.3f , Value: %f",(endTimePar-startTimePar)/1e6,value);
    }
}
Saikumar
  • 3
  • 3
  • You haven't done this with enough iterations to warm up the JIT. You should look at using JMH for microbenchmarking instead of doing it yourself. Also, how many processor cores is your application using? – DodgyCodeException Feb 14 '18 at 10:23
  • 1
    Aside from checking [the way you benchmark](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java), you could first see how it behaves with larger problems. 1000 items is not enough to offset the overhead of forking/joining. – Malte Hartwig Feb 14 '18 at 10:23
  • @MalteHartwig The assignment provides a JUnit test environment and tests it with 2 million items. It is still performing worse. – Saikumar Feb 14 '18 at 10:51
  • The test environment provided by the assignment may be running on a virtual machine which has only one processor core allocated. As this is a purely compute-bound task, you will not get any benefit at all from calculating in parallel on a single processor core. That's why I asked earlier how many cores you were using. – DodgyCodeException Feb 14 '18 at 10:59
  • @DodgyCodeException Yes, I am using 4 cores and the statement `System.setProperty(.....,"4")` had no effect on the execution time whatsoever. – Saikumar Feb 14 '18 at 11:02
  • @Saikumar just to be perfectly clear, if you put `System.out.println(Runtime.getRuntime().availableProcessors());` at the start of this application, does it say 4? (That's not the number of processors on your PC, but the number of processors available to this particular run of your application. – DodgyCodeException Feb 14 '18 at 11:13
  • @ Yes, it says 4. – Saikumar Feb 14 '18 at 11:18

1 Answers1

1

You're using join() for each fork(). join() stalls the program even using Java 8. Try using the CountedCompleter class. It's a form of scatter-gather and doesn't stall. It's the replacement for join() used in streams.

Michael
  • 41,989
  • 11
  • 82
  • 128
edharned
  • 1,884
  • 1
  • 19
  • 20