2

So, I was trying to write a program to do matrix multiplication using multiple threads and then plot a graph between the time taken and the number of threads used. I used the following approach:

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
pthread_mutex_t lock;

#define M 200
#define N 300
#define P 400
#define X 2 // Number of Threads
#define RED "\x1b[31m"
#define GREEN "\x1b[32m"

int A[M][N], B[N][P], C[M][P], D[M][P];

int row = 0;

void *matrixMulti(void *arg)
{
    pthread_mutex_lock(&lock);
    int i = row++;

    for (int j = 0; j < P; j++)
    {
        C[i][j] = 0;
        for (int k = 0; k < N; k++)
        {
            C[i][j] += A[i][k] * B[k][j];
        }
    }

    
    pthread_exit(NULL);
    pthread_mutex_unlock(&lock);
}

void matrixMultiplicationWithoutThreading();
void matrixMultiplicationWithThreading();
void verifyIfBothMatrixAreSame();
int main()
{
    int m, n, p;
    // A: m*n Matrix, B: n*p Matrix
    for (int i = 0; i < M; i++)
        for (int j = 0; j < N; j++)
            A[i][j] = rand() % 10;
    // scanf("%d", &A[i][j]);
    for (int i = 0; i < N; i++)
        for (int j = 0; j < P; j++)
            B[i][j] = rand() % 10;
    // scanf("%d", &B[i][j]);
    
    struct timeval start, end;
    gettimeofday(&start, NULL);
    matrixMultiplicationWithoutThreading();
    gettimeofday(&end, NULL);
    double time = (end.tv_sec - start.tv_sec) * 1e6;
    time = (time + end.tv_usec - start.tv_usec) * 1e-6;
    printf("The time taken by simple matrix calculation without threding is %0.6f\n", time);

    struct timeval start_th, end_th;
    gettimeofday(&start_th, NULL);
    matrixMultiplicationWithThreading();
    gettimeofday(&end_th, NULL);
    time = (end_th.tv_sec - start_th.tv_sec) * 1e6;
    time = (time + end_th.tv_usec - start_th.tv_usec) * 1e-6;
    printf("The time taken by using the Threading Method with %d threads is %0.6f\n", X, time);

    verifyIfBothMatrixAreSame();
}

void matrixMultiplicationWithThreading()
{
    pthread_t threads[X];
    for (int i = 0; i < X; i++)
    {
        threads[i] = (pthread_t)-1;
    }

    // Computation Started:
    for (int i = 0; i < M; i++)
    {

        // At any moment only X threads at max are working
        if (threads[i] == (pthread_t)-1)
            pthread_create(&threads[i % X], NULL, matrixMulti, NULL);
        else
        {
            pthread_join(threads[i % X], NULL);
            pthread_create(&threads[i % X], NULL, matrixMulti, NULL);
        }
    }
    for (int i = 0; i < X; i++)
        pthread_join(threads[i], NULL);
    // Computation Done:
}

void matrixMultiplicationWithoutThreading()
{
    // Computation Started:
    for (int i = 0; i < M; i++)
        for (int j = 0; j < P; j++)
        {
            D[i][j] = 0;
            for (int k = 0; k < N; k++)
                D[i][j] += A[i][k] * B[k][j];
        }
    // Computation Done:
}
void verifyIfBothMatrixAreSame()
{
    for (int i = 0; i < M; i++)
        for (int j = 0; j < P; j++)
        {
            if (C[i][j] != D[i][j])
            {
                printf(RED "\nMatrix's are not equal something wrong with the computation\n");
                return;
            }
        }
    printf(GREEN "\nBoth Matrixes are equal thus verifying the computation\n");
}

Now, this code works sometimes, and sometimes it doesn't, like the result does not match the actual result. Similarly, this code gives a segmentation fault in one of the Linux virtual machines. Also, even when it works correctly, it doesn't give the asymptotically decreasing graph. Instead, the time is almost constant with arbitrary variations with the thread number.

Can someone help with this, like why this is happening? I found multiple solutions to this problem on the internet; some of them don't work (rarely but it happens), but I haven't seen my approach yet; it might be an issue I think. So, can anyone comment on using pthread_create(&threads[i % X], NULL, matrixMulti, NULL), like why this is not a good idea?

EDITED: I have tried taking the suggestion and optimising the code, I have not done the Matrix multiplication efficient method, as we were asked to do the O(n^3) method, but I have tried doing the threading correctly. Is this correct?

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <math.h>

#define M 2
#define N 2
#define P 2
#define X 40 // Number of Threads
#define RED "\x1b[31m"
#define GREEN "\x1b[32m"

int t = 0; // Computation done by the first usedXFullthreads
int usedXFull = 0;

int A[M][N], B[N][P], C[M][P], D[M][P];

int row = 0;

void *matrixMulti(void *arg)
{
    int* l = (int *)arg;
    int n = *l;
    int i = 0, j = 0, k = 0, comp = 0;
    if (n <= usedXFull)
    {
        i = n * t / (N * P);
        j = (n * t - N * P * i) / N;
        k = n * t - N * P * i - N * j;
        if (n == usedXFull)
            comp = M * N * P - usedXFull * t;
        else
            comp = t;
    }
    while (comp)
    {
        if (i == M)
            printf(RED "Some fault in the code\n\n");
        C[i][j] += A[i][k] * B[k][j];
        comp--;
        k++;
        if (k == N)
        {
            j++;
            if (j == P)
            {
                i++;
                j = 0;
            }
            k = 0;
        }
    }

    return NULL;
}

void matrixMultiplicationWithoutThreading();
void matrixMultiplicationWithThreading();
void verifyIfBothMatrixAreSame();
int main()
{
    int m, n, p;
    // A: m*n Matrix, B: n*p Matrix
    for (int i = 0; i < M; i++)
        for (int j = 0; j < N; j++)
            A[i][j] = rand() % 10;
    // scanf("%d", &A[i][j]);
    for (int i = 0; i < N; i++)
        for (int j = 0; j < P; j++)
            B[i][j] = rand() % 10;
    // scanf("%d", &B[i][j]);
    for (int i = 0; i < M; i++)
        for (int j = 0; j < P; j++)
            C[i][j] = 0;

    struct timeval start, end;
    gettimeofday(&start, NULL);
    matrixMultiplicationWithoutThreading();
    gettimeofday(&end, NULL);
    double time = (end.tv_sec - start.tv_sec) * 1e6;
    time = (time + end.tv_usec - start.tv_usec) * 1e-6;
    printf("The time taken by simple matrix calculation without threding is %0.6f\n", time);

    struct timeval start_th, end_th;
    gettimeofday(&start_th, NULL);
    matrixMultiplicationWithThreading();
    gettimeofday(&end_th, NULL);
    time = (end_th.tv_sec - start_th.tv_sec) * 1e6;
    time = (time + end_th.tv_usec - start_th.tv_usec) * 1e-6;
    printf("The time taken by using the Threading Method with %d threads is %0.6f\n", X, time);

    verifyIfBothMatrixAreSame();
}

void matrixMultiplicationWithThreading()
{
    int totalComp = M * N * P; // Total Computation
    t = ceil((double)totalComp / (double)X);
    usedXFull = totalComp / t;
    int computationByLastUsedThread = totalComp - t * usedXFull;
    int computationIndex[X];

    pthread_t threads[X];

    // Computation Started:
    for (int i = 0; i < X; i++)
    {
            computationIndex[i] = i;
        int rc = pthread_create(&threads[i], NULL, matrixMulti, (void *)&computationIndex[i]);
        if (rc)
        {
            printf(RED "ERROR; return code from pthread_create() is %d\n", rc);
            exit(-1);
        }
    }
    for (int i = 0; i < X; i++)
        pthread_join(threads[i], NULL);
    // Computation Done:
}

void matrixMultiplicationWithoutThreading()
{
    // Computation Started:
    for (int i = 0; i < M; i++)
        for (int j = 0; j < P; j++)
        {
            D[i][j] = 0;
            for (int k = 0; k < N; k++)
                D[i][j] += A[i][k] * B[k][j];
        }
    // Computation Done:
}
void verifyIfBothMatrixAreSame()
{
    for (int i = 0; i < M; i++)
        for (int j = 0; j < P; j++)
        {
            if (C[i][j] != D[i][j])
            {
                printf(RED "\nMatrix's are not equal something wrong with the computation\n");
                return;
            }
        }
    printf(GREEN "\nBoth Matrixes are equal thus verifying the computation\n");
}
  • 1
    There are several mistakes in your thread handling.... For instance: `pthread_t threads[X];` gives you an array with **2 elements**. Later you access it using `threads[i]` where `i` is controlled by `for (int i = 0; i < M; i++)`. `M` is 200 so you access the array out of bounds – Support Ukraine Aug 31 '22 at 19:00
  • 1
    I notice that you call `pthread_exit(NULL)` before you release your mutex. Also, is there a point in threading if the entirety of the threaded process is a critical section, only allowing one to actually do anything at a time? – Christian Gibbons Aug 31 '22 at 19:01
  • The mutex part is a big mistake as it prevents on multi threading – Support Ukraine Aug 31 '22 at 19:08
  • And the `row` variable is a no-go... pass `i` to the individuak thread as argument – Support Ukraine Aug 31 '22 at 19:09
  • 2
    The constant create and join of threads is a mistake. It kill performance. If you want 2 threads there shall only be 2 create and 2 join. Each thread shall then handle M/2 lines. If you want 4 threads there shall only be 4 create and 4 join. Each thread shall then handle M/4 lines. And so on...... – Support Ukraine Aug 31 '22 at 19:15
  • Also true. I've made the mistake of constantly spawning and reaping threads in the name of benchmarking serial vs. parallel processing before and all that thread overhead killed performance. – Christian Gibbons Aug 31 '22 at 19:17
  • BTW read this: https://stackoverflow.com/questions/5362577/c-gettimeofday-for-computing-time – Support Ukraine Aug 31 '22 at 19:19
  • @SupportUkraine I was thinking of processing ceil(M*N*P/X) computations on each thread. (The latter threads might do less work this way). But, how do I go about it? Like if for each thread I define a i, j, k and then process ceil(M*N*P/X), I can achieve this, but then I would have to pass a struct kind of thing for each thread, which again I have to precompute this, which would again taking quite some time thus killing the performance. Like the first thread say does only compute half of the calculations to get the first element of the new matrix. Is there any better way? – Vedanta Mohapatra Sep 01 '22 at 07:26
  • @SupportUkraine I have tried the threading again, Is this correct? – Vedanta Mohapatra Sep 01 '22 at 12:37

1 Answers1

6

There are many issues in the code. Here are some points:

  • lock is not initialized with pthread_mutex_init which is required (nor freed).
  • There is no need for locks in a matrix multiplication: work sharing should be preferred instead (especially since the current lock make your code run fully serially).
  • Using pthread_exit is generally rather a bad idea, at least it is here. Consider just returning NULL. Besides, returning something is mandatory in matrixMulti. Please enable compiler warnings so to detect such a thing.
  • There is an out of bound of threads[i] in the 0..M based loop.
  • There is no need to create M threads. You can create 2 threads and divide the work in 2 even parts along the M-based dimension. Creating M threads while allowing only 2 threads to run simultaneously just add more overhead for no reason (it takes time for thread to be created and scheduled by the OS).
  • It is generally better to dynamically allocate large arrays than using static global C arrays.
  • It is better to avoid global variables and use the arg parameter so to get thread-specific data.

To design a fast matrix multiplication, please consider reading this article. For example, the ijk loop nest is very inefficient and should really not be used for sake of performance (not efficient in cache). Besides, note you can use a BLAS library for that (they are highly optimized and easy to use) though I guess this is a homework. Additionally, note that you can use OpenMP instead of pthread so to make the code shorter and easy to read.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59