0

I'm trying to improve the performance of my code, but when I added new threads, my performance had dropped.

First version:

public int[][] calculate(int[][] matriz1, int [][] matriz2, int matrixSize) {
    int[][] matrix = new int[matrixSize][matrixSize];

    for(int i = 0; i < matrixSize; i++){
        for(int k = 0; k < matrixSize; k++){
            for(int j = 0; j < matrixSize; j++){
                matrix[i][j] = matrix[i][j] + matriz1[i][k] * matriz2[k][j];
            }
        }
    }
    return matrix;
}

Second version:

public int[][] calculate(int[][] matriz1, int[][] matriz2, int matrixSize) {
    final int[][] matrix = new int[matrixSize][matrixSize];

    CountDownLatch latchA = new CountDownLatch((int) (Math.pow(matrixSize, 3)));
    List<Thread> threads = new ArrayList<>();

    for (int i = 0; i < matrixSize; i++) {
        finalI = i;
        Thread thread1 = new Thread(() -> {

            for (int k = 0; k < matrixSize; k++) {

                for (int j = 0; j < matrixSize; j++) {

                    matrix[finalI][j] = matrix[finalI][j] + matriz1[finalI][k] * matriz2[k][j];
                    latchA.countDown();

                }
            }
        });
        thread1.start();
        threads.add(thread1);
        if (threads.size() % 100 == 0) {
            waitForThreads(threads);
        }
    }
    try {
        latchA.await();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    return matrix;
}

private void waitForThreads(List<Thread> threads) {
    for (Thread thread : threads) {
        try {
            thread.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
    threads.clear();
}

I tried to create a new class that implements Runnable interface for multi-threading, but the performance dropped down even further.

The result time of 2 algorithms are:

First version: 0.0094

Second version: 1.5917

I'm studying about how to leverage CPU cache memory, and the first algorithm has had the best performance of all.

The repository is https://github.com/Borges360/matrix-multiplication.

In C, adding loop threads improves performance a lot.

The explanation in C: https://www.youtube.com/watch?v=o7h_sYMk_oc

Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46
  • 1
    Those results (especially the first one) make me wonder how you're testing this. Please take a read at [What is microbenchmarking](https://stackoverflow.com/questions/2842695/what-is-microbenchmarking) – Federico klez Culloca Jun 27 '22 at 17:30
  • 1
    Does this answer your question? [Parallellize a for loop in Java using multi-threading](https://stackoverflow.com/questions/43655768/parallellize-a-for-loop-in-java-using-multi-threading) – Suma Jun 27 '22 at 17:31
  • 1
    For CPU bound tasks there is absolutely no benefit in having more threads than (virtual) cores. It is just a direct performance drop to have more threads because context switching between the threads takes time. And you added some synchronization using the latch which is a ***gigantic*** performance hit considering the number of operations and simultaneous threads. – luk2302 Jun 27 '22 at 17:32
  • 4
    What's the size of the matrix? Creating threads takes time, so if you have a relatively small matrix, the time is spent on creating and starting those threads. Plus you countDown in the inner loop which has its own overhead. – akarnokd Jun 27 '22 at 17:32
  • Java Loom, not yet released, is something you should look at. – Sam Jun 27 '22 at 17:44
  • @akarnokd I tested many matrix sizes and the processing time made sense. Thanks for help me. – Luiz Felipe Cruz Borges Jun 27 '22 at 22:32

0 Answers0