What I try to do here is to understand OpenMP, so I wrote a simple program which compares the calculating times of parallelization for an matrix-vector multiplication. It is running with different sizes for the matrix (1024,2048,8192), with a different amount of threads (1,2,4,8) and with different scheduling strategies (static, dynamic, guided). I ran the program on a machine with two cores and 4 threads.
The times are:
Time for 1 threads with 1024 entries and scheduling 0: 26720 ticks
Time for 1 threads with 8192 entries and scheduling 0: 1486755 ticks
Time for 2 threads with 1024 entries and scheduling 0: 159161 ticks
Time for 2 threads with 8192 entries and scheduling 0: 22254787 ticks
But that does not make sense the the amount of cpu ticks increases around 5 to 15 times when increasing the threads from one to two. The times are a little better for 4 and 8 Threads again.
The code is
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#ifdef _OPENMP
#include <omp.h>
#else
#define omp_get_thread_num() 0
#endif
void matrix(unsigned int n)
{
// The big arrays we want on the heap
float *matrix = (float *)malloc(sizeof(float) * n * n);
float *vector = (float *)malloc(sizeof(float) * n);
float *result = (float *)malloc(sizeof(float) * n);
#pragma omp parallel for
// initialize matrix
for (int row = 0; row < n; row++)
{
for (int column = 0; column < n; column++)
{
*(matrix + (row * n) + column) = rand();
}
}
// initialize vectors
#pragma omp parallel for
for (int row = 0; row < n; row++)
{
*(vector + row) = rand();
*(result + row) = 0;
}
// multiply
#pragma omp parallel for
for (int row = 0; row < n; row++)
{
for (int column = 0; column < n; column++)
{
float resultat = *(matrix + (row * n) + column) * *(vector + column);
*(result + row) += resultat;
}
}
}
int main()
{
time_t t_t;
// Initialisieren Zufallsgenerator
srand((unsigned)time(&t_t));
unsigned int threads[] = {1, 2, 4, 8};
unsigned int amounts[] = {1024, 2048, 8192};
omp_sched_t schedules[] = {omp_sched_static,
omp_sched_dynamic,
omp_sched_guided};
size_t size_threads = sizeof(threads) / sizeof(threads[0]);
size_t size_amounts = sizeof(amounts) / sizeof(amounts[0]);
size_t size_schedules = sizeof(schedules) / sizeof(schedules[0]);
// Anzahl Threads variieren
for (int t = 0; t < size_threads; t++)
{
omp_set_num_threads(threads[t]);
for (int a = 0; a < size_amounts; a++)
{
for (int s = 0; s < size_schedules; s++)
{
omp_set_schedule(schedules[s], 0);
clock_t start_t = clock();
matrix(amounts[a]);
clock_t end_t = clock();
printf("Time for %d threads with %d entries and scheduling %d: %ld ticks\n\a", threads[t], amounts[a], s, (end_t - start_t));
}
}
}
return 0;
}
Is there a mistake in my code or an other explanation for this behavior?
Edit: I also tried the gettimeofday() function like
struct timeval start_time;
struct timeval end_time;
...
gettimeofday(&start_time, NULL);
matrix(amounts[a]);
gettimeofday(&end_time, NULL);
...
printf("Time for %d threads with %d entries and scheduling %d: %f s\n\a", threads[t], amounts[a], s, (double)(end_time.tv_sec - start_time.tv_sec) + (double)(end_time.tv_usec - start_time.tv_usec)/1000000);
with the basically same results:
Time for 1 threads with 1024 entries and scheduling 0: 0.024589 s
Time for 1 threads with 8192 entries and scheduling 0: 1.393275 s
Time for 2 threads with 1024 entries and scheduling 0: 0.117452 s
Time for 2 threads with 8192 entries and scheduling 0: 25.067069 s