0

I am writing my first OpenMP project. This is my work:

myFooFunction{
   int64_t Gm = 0;
   double* dist = (double*)middleManDouble;
   int64_t LengthofData = Frames * Height * Width;
   mexEvalString("tic");
   if (BitDepth == 10){
       const unsigned __int16* src__int16 = (unsigned __int16*)middleMan;
       //#pragma omp parallel
       //#pragma omp for
       #pragma omp parallel for
       for (Gm = 0; Gm < LengthofData; ++Gm){
           dist[Gm] = (double)(src__int16[Gm]);
       }
   }
   else if (BitDepth == 8){
       const unsigned __int8* src__int8 = (unsigned __int8*)middleMan;
       //#pragma omp parallel
      // #pragma omp for
       #pragma omp parallel for
       for (Gm = 0; Gm < LengthofData; ++Gm){
           dist[Gm] = (double)(src__int8[Gm]);
       }
   }
   mexEvalString("toc");
}

But I don't see improve in executaion time of for loop despite the fact that my CPU cores utilizations all are upper than 95%. What is wrong with my code? Am I using OpenMp in correct way? I just want to execute the for loop on multi thread.

IndustProg
  • 627
  • 1
  • 13
  • 33
  • Possible duplicate of [In an OpenMP parallel code, would there be any benefit for memset to be run in parallel?](https://stackoverflow.com/questions/11576670/in-an-openmp-parallel-code-would-there-be-any-benefit-for-memset-to-be-run-in-p) – Gilles Aug 30 '17 at 05:35
  • What are your compiler options. What compiler? What OS? – Z boson Aug 30 '17 at 08:51
  • The compiler is Microsoft Visual Studio 2013 and OS is Windows server 2016. – IndustProg Aug 30 '17 at 10:01
  • Did you check the link I posted? How do you think your problem differs (if ever) from what explained there? – Gilles Aug 30 '17 at 10:57
  • Thanks Gilles. It is helpful for me. Now I am reading https://software.intel.com/file/40348. I am puzzled that my CPU cores utilizations all are upper than 95% but no speeding up the execution. – IndustProg Aug 30 '17 at 11:45
  • I am having doubt about overcoming the overhead of launching threads. The `LengthofData` is about 20,000,000,000. Is it enough to overcome the overhead? – IndustProg Aug 30 '17 at 13:09
  • The most important thing you must include is the CPU and memory configuration. – Zulan Aug 30 '17 at 16:01
  • Intel Xeon CPU E5 two processors and 384 GB RAM which is equally divided for each node. I think I can't use OpenMP because of NUMA architecture. Is it correct? I use 210 GB of RAM during execution. – IndustProg Sep 04 '17 at 13:30

0 Answers0