I am trying to optimize a code about image processing with OpenMP. I am still learning it and I came to the classic parallel for loop which is slower than my single threaded implementation.
The for loop I tried to parallelize has only 50 iterations and I saw it oculd explain why the OpenMP use in this case could be useless due to overhead operations.
But is it possible to evaluate these costs ? What are the cost differences between shared / private / firstprivate (...) clauses ?
This is the for loop I want to parallelize:
#pragma omp parallel for \
shared(img_in, img_height, img_width, w, update_gauss, MoGInitparameters, nb_motion_bloc) \
firstprivate(img_fg, gaussStruct) \
private(ContrastHisto, j, x, y) \
schedule(dynamic, 1) \
num_threads(max_threads)
for(i = 0; i<h; i++){ // h=50
for(j = 0; j<w; j++){
ComputeContrastHistogram_generic1D_rect(img_in, H_DESC_STEP*i, W_DESC_STEP*j, ContrastHisto, H_DESC_SIZE, W_DESC_SIZE, 2, img_height, img_width);
x = i;
y = w - 1 - j;
img_fg->data[y][x] = 1-MatchMoG_GaussianInt(ContrastHisto, gaussStruct->gauss[i*w+j], update_gauss, MoGInitparameters);;
#pragma omp critical
{
nb_motion_bloc += img_fg->data[y][x];
nb_motion_bloc += img_fg->data[y][x];
}
}
}
Maybe I am doing some mistakes but if that's the case please tell me why !!