I have a question for you about a concurrent software on java.
The main algorithm of my application have to factor an A
matrix into LU
matrices : A = LU. The decompose method pasted here, makes the Gauss-Jordan reduction. The software is designed to work on square matrix with the position A[0][0] != 0
.
Unfortunately, for the correct working of the algorithm, I have to wait that every row actualizes the values.
I tried to make this algorithm parallel using a barrier (for wait the actualization of every row and increment the "rigaCorrente" value) but I don't gain a real speedUp even if in the parallel version my processors (2.4 GHz Core 2 Duo P8600) work at 80% of the total (instead of 40-50% of the serial one).
My fear is that I'm encountering a false sharing situation. Is the problem related to the false sharing or to other matters? I know that the JVM performs good optimizations but it still uses the half power of the processor. I don't think it's impossible improve the algorithm!
My serial code:
public void decompose(){
int n = A.length;
for(int k=0; k<n-1;k++) {
for(int i=k+1; i<n; i++) {
A[i][k] = A[i][k]/A[k][k];
for(int j=k+1; j<n; j++) {
A[i][j] = A[i][j] - A[i][k] * A[k][j];
}
}
}
decomposed = true;
}
My parallel code:
public class Manager {
private double [][] A;
private Semaphore main = new Semaphore(0);
private CyclicBarrier barriera;
private AtomicInteger index;
private int rigaCorrente = 0;
private boolean stop = false;
public Manager(final double A[][], final int numThr){
this.A = A;
this.index = new AtomicInteger(1);
barriera = new CyclicBarrier(numThr, new Runnable(){
@Override
public void run() {
rigaCorrente++;
index = new AtomicInteger(rigaCorrente+1);
if(rigaCorrente == A.length - 1){
setStop(true);
main.release();
}
}
});
}
The thread class:
public class Deco implements Runnable {
private Manager manager;
public Deco(Manager manager){
this.manager = manager;
}
@Override
public void run() {
double [][] A = manager.getA();
while(manager.getStop() == false){
int i;
while((i = (manager.getIndex().getAndIncrement())) < (A.length)){
double pivot = A[i][manager.getRigaCorrente()]/A[manager.getRigaCorrente()] [manager.getRigaCorrente()];
for(int k = manager.getRigaCorrente(); k<A.length; k++)
A[i][k] = A[i][k] - (pivot*A[manager.getRigaCorrente()][k]);
A[i][manager.getRigaCorrente()] = pivot;
}
manager.acquireBarriera();
}// Stop
}
}
Main for parallel code:
package luConcurrent.test;
import java.util.Arrays;
import java.util.Scanner;
import lu.DecompositionLU;
import lu.IO;
public class Starter {
private static IO io;
private static DecompositionLU dec;
public static void main(String[] args) throws Exception {
io = new IO("//Users//black//Desktop//serie//2500.txt");
int numThr = 2;
double [][] A = io.readMatrixFromInputFile();
double [] b = io.readArrayFromInputFile();
double [] x;
dec = new DecompositionLU(A);
System.out.println("A.length: "+A.length);
Manager manager = new Manager(A,numThr);
Thread[] pool = new Thread[numThr];
for(int i=0; i<pool.length; i++){
pool[i] = new Thread(new Deco(manager));
}
long inizio = System.nanoTime();
for(int i = 0; i<pool.length; i++){
pool[i].start();
}
manager.getMain().acquire();
dec.ProgresiveSustitution(b);
x = dec.RegresiveSustitution(b);
long fine = System.nanoTime()-inizio;
System.out.println("Solution is: "+Arrays.toString(x));
Scanner sc = new Scanner(System.in);
sc.nextLine();
System.out.println("Tempo: "+fine);
sc.close();
}
}
Results:
1000x1000 Serial: 1154679000 nanoSec
1000x1000 Parallel 2 Threads: 1135663000 nanoSec
1750x1750 Serial: 7502559000 nanoSec
1750x1750 Parallel 2 Threads: 6667129000 nanoSec
4000x4000 Serial: 89851311000 nanoSec
4000x4000 Parallel 2 Threads: 84348616000 nanoSec