So, I am practicing multi threading in java and trying to add the elements of a randomly generated 2D integer array both sequentially and using 4 threads. I measured the performance of my code and for some reason sequential part is a lot faster than multi-threaded. Here is the code for sequential addition:
public class ArraySum2DNonMT {
private int[][] arrayToSum;
private int totalSum;
public ArraySum2DNonMT(int[][] arr){
this.arrayToSum = arr;
this.setTotalSum(0);
}
public void runSequential(){
for(int i = 0; i < arrayToSum[0].length; i++){
for(int j = 0; j < arrayToSum.length; j++){
setTotalSum(getTotalSum() + arrayToSum[j][i]);
}
}
}
public int getTotalSum() {
return totalSum;
}
public void setTotalSum(int totalSum) {
this.totalSum = totalSum;
}
}
Here is the code for Multi-threaded version:
package multiThreaded;
/**
*
* @author Sahil Gupta
*
* This class takes in a 2D integer array and adds it's contents. This
* addition will be concurrent between several threads which will divide
* the work of the array based on the threadID assigned to thread by the
* programmer. Assume that the passed in 2D array to the constructor is a
* matrix with each array in the main array having same length.
*/
public class ArraySum2D implements Runnable{
private int[][] arrayToSum;
private int threadID;
private int totalSum;
public ArraySum2D(int[][] arr, int threadID){
this.arrayToSum = arr;
this.threadID = threadID;
this.setTotalSum(0);
}
@Override
public void run() {
int arrayCol = arrayToSum[0].length;
int arrayRow = arrayToSum.length;
int colStart = (int)((threadID%2) * (arrayCol/2));
int rowStart = (int)((int)(threadID/2) * (arrayRow/2));
int colEnd = colStart + (int)(arrayCol/2);
int rowEnd = rowStart + (int)(arrayRow/2);
for(int i = colStart; i < colEnd; i++){
for(int j = rowStart; j < rowEnd; j++){
setTotalSum(getTotalSum() + arrayToSum[j][i]);
}
}
}
public int getTotalSum() {
return totalSum;
}
public void setTotalSum(int totalSum) {
this.totalSum = totalSum;
}
}
Here is the main:
package controller;
import java.util.Random;
import multiThreaded.ArraySum2D;
import sequentialNonMT.ArraySum2DNonMT;
public class ControllerMain {
private final static int cols = 20;
private final static int rows = 10;
private static volatile int[][] arrayToAdd = new int[rows][cols];
private static Random rand = new Random();
private static ArraySum2D a0, a1, a2, a3;
public static void main(String[] args) throws InterruptedException{
for(int j = 0; j < rows; j++){
for(int i = 0; i < cols; i++){
arrayToAdd[j][i] = rand.nextInt(100);
}
}
ArraySum2DNonMT a = new ArraySum2DNonMT(arrayToAdd);
long startTimeSequential = System.nanoTime();
a.runSequential();
long estimatedTimeSequential = System.nanoTime() - startTimeSequential;
System.out.println("The total sum calculated by sequential program is: " + a.getTotalSum());
System.out.println("The total time taken by sequential program is: " + estimatedTimeSequential);
a0 = new ArraySum2D(arrayToAdd, 0);
a1 = new ArraySum2D(arrayToAdd, 1);
a2 = new ArraySum2D(arrayToAdd, 2);
a3 = new ArraySum2D(arrayToAdd, 3);
Thread t0 = new Thread(a0);
Thread t1 = new Thread(a1);
Thread t2 = new Thread(a2);
Thread t3 = new Thread(a3);
long startTimeMultiThreaded = System.nanoTime();
t0.start();
t1.start();
t2.start();
t3.start();
t0.join();
t1.join();
t2.join();
t3.join();
int Sum = addThreadSum();
long estimatedTimeMultiThreaded = System.nanoTime() - startTimeMultiThreaded;
System.out.println("The total sum calculated by multi threaded program is: " + Sum);
System.out.println("The total time taken by multi threaded program is: " + estimatedTimeMultiThreaded);
}
private static int addThreadSum(){
return a0.getTotalSum() + a1.getTotalSum() + a2.getTotalSum() + a3.getTotalSum();
}
}
The output I am currently getting shows significant difference in runtime (measured here in nanoseconds). This is what I get:
The total sum calculated by sequential program is: 10109
The total time taken by sequential program is: 46000
The total sum calculated by multi threaded program is: 10109
The total time taken by multi threaded program is: 641000
The sequential code is around 13 times faster. Can you please help me point out what I might not be doing correctly? I have a dual core i7 haswell, macbook air. I am not sure why it is taking longer, but here are a few ideas I have in mind which might be causing this: False sharing, Too much parallelism/threads (4 for dual core), Cache coherence protocol might not be favoring me, some other basic multi threaded stuff I am missing/don't know about.
Please help me identify the specific cause and ways I can make multi threaded run faster than sequential. Thank you so much for helping me out!
Edit: More info about the processor and it's caches: Processor Name: Intel Core i7 Processor Speed: 1.7 GHz Number of Processors: 1 Total Number of Cores: 2 L2 Cache (per Core): 256 KB L3 Cache: 4 MB
I think this can have up to 4 threads according to Intel's data sheet.
P.S. This is my first post to ask question but I have been using this site to clear doubts for a while now. Forgive me for any mistakes I make.