1

I have the following code of filling an array with multiple threads:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define MAX_ITEMS 67108864
#define LINES_PER_THREAD 8388608
#define THREADS 8
static int *array;

static pthread_t pids[THREADS];
static int args[THREADS];


static void init_array_line(int *line) {
    int i, max;
    i = *line;
    max = i + LINES_PER_THREAD;
    for (i; i < max; i++)
        array[i] = rand() % 10000 + 1;
}

static void init_array() {
     int i;
    for ( i = 0; i < THREADS; i++) {
        args[i]=i* LINES_PER_THREAD;
        pthread_create(pids + i, NULL, &init_array_line, args + i);;
    }
}

static wait_all() {
    for (int i = 0; i < THREADS; i++) {
        pthread_join(pids[i], NULL);
    }
}

int
main(int argc, char **argv)
{
    array = (int *)malloc(MAX_ITEMS * sizeof(int));
    init_array();
    wait_all();
}

I am giving each thread 1/8 of the array to fill LINES_PER_THREAD, but it seems that it takes longer than filling it normally. Any suggestions why might this be?

C-PROG
  • 113
  • 3
  • 12
  • 1
    Is there any reason why you like the keyword `static` so much? Anyway - you do not need to cast `malloc` in C - see [this](http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc) – Ed Heal Jan 01 '17 at 15:56
  • "it seems that it takes longer than filling it normally". How do you measure? – n. m. could be an AI Jan 01 '17 at 16:00
  • I am compiling with: gcc q.c -lm -std=c99 -o q -lpthread and measuring with time ./q – C-PROG Jan 01 '17 at 16:02

1 Answers1

1

I suspect the main bottleneck would the calls to rand(). rand() isn't required to be thread-safe. So, it can't be safely used in a multi-threaded program when multiple threads could call rand() concurrently. But the Glibc implementation uses an internal lock to protect against such uses. This effectively serializes the call to rand() in all threads and thus severely affecting the multi-threaded nature of your program. Instead use rand_r() which doesn't need to maintain any internal state (because the caller(s) do) and can at least solve this aspect of your problem.

In general, if the threads don't do sufficient work then the thread creation/synchronization overhead can outdo the concurrency that could be available on multi-core systems using threads.

P.P
  • 117,907
  • 20
  • 175
  • 238
  • This works. Now the execution time is 5sec insted of 25sec. Thank you. – C-PROG Jan 01 '17 at 16:23
  • But still, if i run the program normally without any threads it takes 1.2 sec but with threads it takes 5 sec. So like you said, the problem is not big enough for 8 threads. – C-PROG Jan 01 '17 at 16:34
  • You might want to examine your code some more. I translated the program into Ada and my time for 8 tasks is 0.37 seconds while my time for the sequential version is 2.01 seconds. – Jim Rogers Jan 01 '17 at 22:11
  • @JimRogers I did some more testing and wait_all() function takes 4,7 seconds. That is the reason why parallel version takes more time than sequential one for me. Perhaps any ideas how to resolve that? I am quite new to parallel programing. – C-PROG Jan 02 '17 at 11:10
  • @io16 The Ada language does not have a direct equivalent to the wait_all() function, so I wrote my own. Testing shows that my version, for Ada, takes 0,49 seconds and the overall run takes 0,89 seconds. I was too late to edit my previous timings. Setting up the Ada tasks, which are equivalent to threads, took 0,37 seconds. I have the program report its timing as its output. – Jim Rogers Jan 02 '17 at 14:55