Shared Data in pthread Programming

Question

There's something I'm still not very sure about in pthread programming. And I'll appreciate if someone can tell me an absolute answer.

My previous question is here: How do I assign array variable in a simple Pthread programming?

And now, I'm working on matrix multiplication. It works well using this:

typedef struct {
    int rowIdx;
    int (*matA)[SIZE], (*matB)[SIZE], (*matC)[SIZE];
} newType; 

int main (){
    int matriksA[SIZE][SIZE];
    int matriksB[SIZE][SIZE];
    int matriksC[SIZE][SIZE];

    for (i=0;i<NUM_THREAD;i++) {
         (*block).rowIdx = i;
         (*block).matA = matriksA;
         (*block).matB = matriksB;
         (*block).matC = matriksC;
         pthread_create(&arrThread[i], NULL, matrixMul, (void *)block);
         block++;
    }
}

void *matrixMul(void *x){
    newType *p = (newType *) x;
    int i = (*p).rowIdx;
    int j,k;
    for (j=0;j<SIZE;j++){
        int result = 0;
        for(k=0;k<SIZE;k++){
            int MAik = (*p).matA[i][k];
            int MBkj = (*p).matB[k][j];
            result = result + (MAik*MBkj);
        }
        (*p).matC[i][j] = result;
    }
    pthread_exit(NULL);
}

The matrixMul is doing the matrix multiplication matC = matA x matB.

I tried using this struct before but it didn't work.

typedef struct {
    int rowIdx;
    int **matA, **matB, **matC;
} newType;

Apparently from what I've read, a variable array can be considered as a pointer that holds the address of the first element in the array. As for 2-dimensional array, we must tell the compiler the size of the column. Thus we must use (*matA)[SIZE] instead of **matA in the typedef struct.

But I'm still not sure about what I was doing there. Did I just copy a 2D-array to another 2D-array by assigning its pointer or what? Kinda confusing....LOL...

My next questions is regarding these lines:

(*block).matA = matriksA;
(*block).matB = matriksB;
(*block).matC = matriksC;

What actually happened there? Do each block variable in the code above has its own copy of matrix data? Or do they just share it by having their pointer refer to the same location in memory which means matA, matB, and matC behave like a static variable (as in object oriented programming)? In other words, is there only one copy of matA, matB, and matC; and do the threads access the shared data simultantly? Or is there many copies of 'matA's, and each of them has its own different allocation in RAM?
Same question with my first post, what happened behind these lines? (*z).arrA = arrayA; (*z).arrB = arrayB; (*z).arrC = arrayC;
Are the codes above efficient enough to do the task (array addition and matrix multiplication)? Or is there another way which more efficient from the memory-allocation point of view?

@Code-Guru: there I've posted the new question.

`(*block).matA` is equivalent to `block->matA`. You could use SIMD instructions to do this more efficiently. There are many optimizations for special purpose matrix, like spare or symmetric. The general one must do the work to be done. You can partition the multiplication into separate threads or code directly to the machine level. Many machines have *multiply and accumulate* instructions. A decent compiler will recognize this and use it with the code you have. The only other trick is to prime the loop to get optimum memory access. — artless noise, Mar 25 '13 at 01:27
Sorry, that is **sparse** matrix. As you can see by this [stack overflow question](http://stackoverflow.com/questions/11110604/why-is-boosts-matrix-multiplication-slower-than-mine), your method is already pretty good. — artless noise, Mar 25 '13 at 01:40
The code you have at the top is indeed the correct declaration. They are pointers, to be sure. Like all non-void pointer, they are typed. In this case they are typed to be "pointer to an array of SIZE-integers". You are correct the second decl (the one you tried without success) is incorrect. Those are not pointers to arrays; the are pointers to pointers, often used for mimicking multi-dimenions array allocations that are NOT contiguous. The most immediate problem I see in your code is the incrementing of `block`. Since there is no decl, I can't tell if there are N of those, one per thread. — WhozCraig, Mar 25 '13 at 02:05

Jonathan Leffler · Answer 1 · 2013-03-25T02:40:37.017

The basic problem in threaded programming is to make sure that there is no possibility of two separate threads trying to modify the data at the same time, or read data which another thread might be modifying. (If there is no danger that the data might be modified, it is perfectly OK for two threads to read the same non-changing data at the same time.)

Applied to your matrix multiplication problem, the threads are only going to read matrix A and matrix B, so you don't have to control access to those variables — assuming that they are initialized before you launch the threads.

On the other hand, the result Matrix C, will be being accessed for write, so you either have to be sure that you have partitioned the workload so that no two threads will ever access the same elements (they are working on disjoint subsets of Matrix C), or you have to coordinate access so that only one thread is modifying a given cell at a given time, and you've enforced this with mutual exclusion (a mutex) or something equivalent.

Your questions

You've not shown how block is defined, amongst other things. It is helpful if you show us an SSCCE (Short, Self-Contained, Correct Example), like the one I show you below. It saves us having to reverse-engineer your code fragment into working code. Done properly, it does not take up much space. (An earlier edition of this answer went off on a tangent because the code was not complete!)

In your original, you created NUM_THREAD threads to process SIZE x SIZE matrices. Since you didn't show the definition of SIZE or NUM_THREAD, we have to assume that the two sizes were equal. Various different recipes for disaster were available depending on the relative sizes of the two constants.

The threads are all being given the same matrices to work on, which is what you were really asking about. Each thread has a pointer to the same memory.
Assuming the (*z).arrA = arrayA; you refer to is the (*block).arrA = matriksA; assignment, then the you're assigning a pointer to an array of SIZE integers to block->arrA (which is equivalent to (*block).arrA). That's a little contorted, but legitimate. You'll need to be careful using that.
You ask if the code is efficient enough. First sub-question: does it produce the correct answer (and is that guaranteed)? I'm not yet sure about that. However, if each thread is working on one column of the result matrix, that should be safe enough.

SSCCE

This code uses C99 constructs. It won't compile under C89.

#include <stdio.h>
#include <pthread.h>

enum { SIZE = 3 };

typedef struct
{
    int   rowIdx;
    int (*matA)[SIZE];
    int (*matB)[SIZE];
    int (*matC)[SIZE];
} newType; 

extern void *matrixMul(void *);

static void print_matrix(const char *tag, int d1, int d2, int matrix[d1][d2])
{
    printf("%s: (%d x %d)\n", tag, d1, d2);
    for (int i = 0; i < d1; i++)
    {
        printf("%d:", i);
        for (int j = 0; j < d2; j++)
            printf(" %6d", matrix[i][j]);
        putchar('\n');
    }
}

int main(void)
{
    int matriksA[SIZE][SIZE] = { {  1,  2,  3 }, {  4,  5,  6 }, {  7,  8,  9 } };
    int matriksB[SIZE][SIZE] = { { 11, 12, 13 }, { 14, 15, 16 }, { 17, 18, 19 } };
    int matriksC[SIZE][SIZE];
    newType  thedata[SIZE];
    newType *block = thedata;
    pthread_t arrThread[SIZE];

    for (int i = 0; i < SIZE; i++)
    {
         block->rowIdx = i;
         block->matA = matriksA;
         block->matB = matriksB;
         block->matC = matriksC;
         //matrixMul(block);
         pthread_create(&arrThread[i], NULL, matrixMul, block);
         block++;
    }

    for (int i = 0; i < SIZE; i++)
        pthread_join(arrThread[i], 0);

    print_matrix("Matrix A", SIZE, SIZE, matriksA);
    print_matrix("Matrix B", SIZE, SIZE, matriksB);
    print_matrix("Matrix C", SIZE, SIZE, matriksC);
}

void *matrixMul(void *x){
    newType *p = (newType *) x;
    int i = p->rowIdx;
    for (int j = 0; j < SIZE; j++)
    {
        int result = 0;
        for(int k = 0; k < SIZE; k++)
        {
            int MAik = p->matA[i][k];
            int MBkj = p->matB[k][j];
            result += MAik * MBkj;
        }
        p->matC[i][j] = result;
    }
    //pthread_exit(NULL);
    return(0);
}

You might note that I've added a matrix printing function (and used it). I also added sample data for a pair of 3x3 matrices, and I've verified that the answer is correct. I did the testing in two steps:

Check that a single-threaded version of the code produced the correct answer.
Add threading.

If step 1 produced the wrong answer, you know that you've only got basic calculations to fix; it isn't a thread-induced problem (because there is only one thread!). Fortunately, it produces the right answer. Adding the threading was then simple.

Output

$ ./ta
Matrix A: (3 x 3)
0:      1      2      3
1:      4      5      6
2:      7      8      9
Matrix B: (3 x 3)
0:     11     12     13
1:     14     15     16
2:     17     18     19
Matrix C: (3 x 3)
0:     90     96    102
1:    216    231    246
2:    342    366    390
$

Shared Data in pthread Programming

1 Answers1

Your questions

SSCCE

Linked