1

I am trying to implement a task status reporting features inside parallel for loop. This parallelization of for loop is being performed using “OPENMP”.

I want status reporting to be performed like this:

Work done 70%; estimated time left 3:30:05 hour.

Of course, I can calculate the “estimated time left” by calculating the difference between "start time" and "current time". But, it seems that I cannot calculate "work done" accurately inside the for loop even using “static” declaration.

Some guidance would be appreciated.

Output of my code:

Values of cores : 8
Outer loop =================================
Thread 0  iCount0   
 % of work done 10
Outer loop ================================= 
Thread 0  iCount1
Outer loop ================================= 
Thread 2  iCount2
Outer loop ================================= 
Thread 7  iCount3
 % of work done 40
Outer loop =================================
Thread 5  iCount4
 % of work done 50
Outer loop =================================
Thread 3  iCount5
 % of work done 60
Outer loop =================================
Thread 4  iCount6 
 % of work done 70
Outer loop =================================
Thread 1  iCount7
 % of work done 20
 % of work done 80
Outer loop ================================= 
Thread 6  iCount8 
 % of work done 90
Outer loop ================================= 
Thread 1  iCount9  
 % of work done 100
 % of work done 30

As you can see from last two lines of output, I am not able to calculate status of job properly.

Here is my code:

NOTE: I have intentionally used "std::endl" rather than "\n" as somehow flushing the output buffer messes up with my work% calculation. I am sure that similar scenario would arise if I perform real calculation inside parallel for

#include "stdafx.h"
#include <iostream>     // std::cout, std::endl
#include <iomanip>      // std::setfill, std::setw
#include <math.h>       /* pow */
#include <omp.h>

int main(int argc, char** argv)
  {
    // Get the number of processors in this system
    int iCPU = omp_get_num_procs();

    // Now set the number of threads
    omp_set_num_threads(iCPU);
    std::cout << "Values of cores : " << iCPU <<" \n";

    int x = 0; 
    int iTotalOuter = 10;
    static int iCount = 0;

    #pragma omp parallel for private(x) 
    for(int y = 0; y < iTotalOuter; y++) 
    { 
        std::cout << "Outer loop =================================\n" ;     
        std::cout <<"\nThread "<<omp_get_thread_num()<<"  iCount" << iCount<<std::endl;

        for(x = 0; x< 5; x++) 
        { 
            //std::cout << "Inner loop \n" ;        
        } 
        iCount = iCount + 1;        
        std::cout <<"\n % of work done " << (double)100*((double)iCount/(double)iTotalOuter)<<std::endl;
    }

  std::cin.ignore(); //Wait for user to hit enter
  return 0;
  }

UPDATE: Based on answer of "Avi Ginsburg", I am trying to do like this:

#include "stdafx.h"
#include <iostream>     // std::cout, std::endl
#include <iomanip>      // std::setfill, std::setw
#include <math.h>       /* pow */
#include <omp.h>
void ReportJobStatus(int , int );

int main(int argc, char** argv)
  {   
    // Get the number of processors in this system
    int iCPU = omp_get_num_procs();

    // Now set the number of threads
    omp_set_num_threads(iCPU);
    std::cout << "Values of cores : " << iCPU <<" \n";

    int x = 0; 
    int iTotalOuter = 100;
    static int iCount = 0;

    #pragma omp parallel for private(x) 
    for(int y = 0; y < iTotalOuter; y++) 
    { 
        std::cout << "Outer loop =================================\n" ;     

        for(x = 0; x< 5; x++) 
        { 
            //std::cout << "Inner loop \n" ;        
        } 
        #pragma omp atomic
        iCount++;   

        std::cout<< " omp_get_thread_num(): " << omp_get_thread_num() <<"\n";
        if (omp_get_thread_num() == 0){
            ReportJobStatus(iCount, iTotalOuter);
        }

    }

  std::cin.ignore(); //Wait for user to hit enter
  return 0;
  }

Problem (Updated): The problem is that the same thread is being used for concurrent execution. So, "work done" report becomes seriously restricted. How could jobs to different cores be assigned based on data.

Here is current output of my code:

Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 1
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 2
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 3
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 4
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 5
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 6
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 7
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 8
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 9
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 10
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 11
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 12
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 13
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 14
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 15
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 16
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 17
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 18
Outer loop =================================
 omp_get_thread_num(): 0
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 1

 % of work done 19
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 54
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 55
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 56
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 57
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 58
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 59
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 60
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 61
Outer loop =================================
 omp_get_thread_num(): 0

 % of work done 62
Outer loop =================================
 omp_get_thread_num(): 6
Outer loop =================================
 omp_get_thread_num(): 6
Outer loop =================================
 omp_get_thread_num(): 6
Outer loop =================================
 omp_get_thread_num(): 6
Outer loop =================================
 omp_get_thread_num(): 6
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 3
Outer loop =================================
 omp_get_thread_num(): 1
Outer loop =================================
 omp_get_thread_num(): 5
Outer loop =================================
 omp_get_thread_num(): 5
Outer loop =================================
 omp_get_thread_num(): 5
Outer loop =================================
 omp_get_thread_num(): 5
Outer loop =================================
 omp_get_thread_num(): 5
Outer loop =================================
 omp_get_thread_num(): 5
Outer loop =================================
 omp_get_thread_num(): 5
Outer loop =================================
 omp_get_thread_num(): 5
Outer loop =================================
Outer loop =================================
 omp_get_thread_num(): 4
Outer loop =================================
 omp_get_thread_num(): 4
Outer loop =================================
 omp_get_thread_num(): 4
Outer loop =================================
 omp_get_thread_num(): 4
Outer loop =================================
 omp_get_thread_num(): 4
Outer loop =================================
 omp_get_thread_num(): 4
Outer loop =================================
 omp_get_thread_num(): 4
Outer loop =================================
 omp_get_thread_num(): 4
Outer loop =================================
 omp_get_thread_num(): 4
Outer loop =================================
 omp_get_thread_num(): 4
 omp_get_thread_num(): 7
Outer loop =================================
 omp_get_thread_num(): 7
Outer loop =================================
 omp_get_thread_num(): 7
Outer loop =================================
 omp_get_thread_num(): 7
Outer loop =================================
 omp_get_thread_num(): 7
Outer loop =================================
 omp_get_thread_num(): 7
Outer loop =================================
 omp_get_thread_num(): 2
Outer loop =================================
 omp_get_thread_num(): 2
Outer loop =================================
 omp_get_thread_num(): 2
Outer loop =================================
 omp_get_thread_num(): 2
Outer loop =================================
 omp_get_thread_num(): 2
Outer loop =================================
 omp_get_thread_num(): 2
Outer loop =================================
 omp_get_thread_num(): 2 
Garima Singh
  • 1,410
  • 5
  • 22
  • 46

1 Answers1

4

Use a critical or atomic within the loop:

#pragma omp critical
    {
        (++prog);
    }

or better:

#pragma omp atomic
(++prog);

and think about only letting the master thread print the progress.

if(omp_get_thread_num() == 0)
{
  cout << "Progress: " << float(prog)/totalNumber;
}
Avi Ginsburg
  • 10,323
  • 3
  • 29
  • 56
  • Do you think that I should prefer atomic over critical? (http://stackoverflow.com/questions/7798010/openmp-atomic-vs-critical). This "parallel for loop" would be used for very computationally demanding job. So, I do not really want to interfere with main job while calculating the "status of work done" on master thread. – Garima Singh Feb 02 '15 at 10:59
  • I cannot answer better than the [accepted answer](http://stackoverflow.com/a/7798994/2899559). – Avi Ginsburg Feb 02 '15 at 11:04
  • Personally I'd shun `critical` and `atomic` which both, effectively, serialise part of the computation, and just have the master thread report its own progress. Unless the thread loading is seriously out of balance the master thread's progress will be representative enough of the whole computation's progress. – High Performance Mark Feb 02 '15 at 11:06
  • @HighPerformanceMark If the work load is unbalanced, that neither should incur serialization. Furthermore, `atomic` may not even incur a mutex lock, if handled by the hardware. – Avi Ginsburg Feb 02 '15 at 11:08
  • It's only serialised if multiple threads are trying to increment the variable concurrently, which is stupendously unlikely if it's long-running work. In addition, even if they did, the amount of time taken to serialize four increments to one integer will be trivial. – Puppy Feb 02 '15 at 11:20
  • @AviGinsburg I tried to use master thread inside the parallel for loop. I am getting error. Please see the update to my original question. Some help would be appreciated. – Garima Singh Feb 02 '15 at 11:21
  • @HighPerformanceMark I am now trying to do same as you suggested, i.e. letting the master thread report the progress. But, it seems that particular thread performs concurrent execution and core-0 or master thread remain ideal for long time. So, the status report does not remain representation enough. Is it possible to assignment of task be based on data so that none of cores stay IDEAL for LONG TIME? How would the directive preceding the loop be changed to accomplish this? – Garima Singh Feb 02 '15 at 12:10
  • I can't see where your code does any real work, I'm not sure what your current code is. I don't think you can make any judgements based on the status report of an empty program. I can't see how any thread could be idle, or busy, for a *LONG TIME* in the code you've shown. I'm not sure what you are asking now, or what your question is. Sorry, but I can't really help any more. – High Performance Mark Feb 02 '15 at 12:21