I am writing my first OpenMP project. This is my work:
myFooFunction{
int64_t Gm = 0;
double* dist = (double*)middleManDouble;
int64_t LengthofData = Frames * Height * Width;
mexEvalString("tic");
if (BitDepth == 10){
const unsigned __int16* src__int16 = (unsigned __int16*)middleMan;
//#pragma omp parallel
//#pragma omp for
#pragma omp parallel for
for (Gm = 0; Gm < LengthofData; ++Gm){
dist[Gm] = (double)(src__int16[Gm]);
}
}
else if (BitDepth == 8){
const unsigned __int8* src__int8 = (unsigned __int8*)middleMan;
//#pragma omp parallel
// #pragma omp for
#pragma omp parallel for
for (Gm = 0; Gm < LengthofData; ++Gm){
dist[Gm] = (double)(src__int8[Gm]);
}
}
mexEvalString("toc");
}
But I don't see improve in executaion time of for
loop despite the fact that my CPU cores utilizations all are upper than 95%. What is wrong with my code?
Am I using OpenMp in correct way? I just want to execute the for loop on multi thread.