0

this is my test code

void test_code() {
  omp_set_num_threads(4);
  #pragma omp parallel
  {
    int tid = omp_get_thread_num();

    #pragma omp for
    for (int i = 0; i < 4; i++)
      printf("Hello World %d %d\n", tid, i);
  }
}
...
void CResizer::ThreadMain()
{
  test_code();

  MLINFO(ObjectName(), "%s started\n", __FUNCTION__);
  CBufferMonitor bufResize("Resizer (frames):", ObjectId());    
  ...

and result should be this

Hello World 1 1 Hello World 3 3 Hello World 2 2 Hello World 0 0

but this is my result is

Hello World 0 0 Hello World 0 1 Hello World 0 0 Hello World 0 1 Hello World 0 2 Hello World 0 3 Hello World 0 0 Hello World 0 1 Hello World 0 2 Hello World 0 3 Hello World 0 0 Hello World 0 0 Hello World 0 1 Hello World 0 2 Hello World 0 3 Hello World 0 0 Hello World 0 1 Hello World 0 2 Hello World 0 3 Hello World 0 0 Hello World 0 1 Hello World 0 2 Hello World 0 1 Hello World 0 2 Hello World 0 3 Hello World 0 0 Hello World 0 1 Hello World 0 2 Hello World 0 3 Hello World 0 3 Hello World 0 2 Hello World 0 3

Do you have any idea why it work like this? My system has 8 core cpu, centos 7.5.

SecretAgentMan
  • 2,856
  • 7
  • 21
  • 41
Namuk Kim
  • 17
  • 6
  • Can you post the entire code I have tried what you have posted and I am unable to reproduce your problem – dreamcrash Jun 17 '21 at 07:46
  • What compiler (and openMP version) did you use? I was also unable to reproduce the problem... – Laci Jun 17 '21 at 12:10
  • i use gcc 7.3.1 and i just added "add_compile_options(-fopenmp)" to cmakeLists.txt – Namuk Kim Jun 18 '21 at 02:55
  • #dreamcrash, entire code is too many. in my code, intel IPP, pthread and SIMD are used – Namuk Kim Jun 18 '21 at 03:06
  • updated test code – Namuk Kim Jun 18 '21 at 03:22
  • @NamukKim, is this a new application? If you recently added another technology to your application , "add_compile_options(-fopenmp)", maybe this is a case of premature optimization. Did your application already work as expected before you introduced OpenMP? Do you have a set of tests to identify any regressions? – Daniel Dearlove Jun 18 '21 at 06:12
  • @DanielDearlove, no it is code for company app. and i tried other cmake script for openmp but didnt worked too. [link](https://stackoverflow.com/a/12404666/16248362) . Everything works fine except for openmp. – Namuk Kim Jun 18 '21 at 06:31
  • I think the only possible explanation is that `#pragma omp parallel` can only use one thread (e.g. no threads left or your code is inside a nested parallel region and nested parallelism is not allowed) therefore `tid` is always 0 and `test_code()` was called 8 times. – Laci Jun 18 '21 at 08:09
  • So my guess is that `CResizer::ThreadMain()` is executed simultaneously by 8 threads in your application. This will produce such an output. (BUT without the code it is hard to tell) – Laci Jun 18 '21 at 08:30
  • Why do you mix threads (pthread) and openMP? You should choose one of them. – Laci Jun 19 '21 at 04:13
  • @Laci several people were working together, so it became mixed. and before executing test_code(), there is only one CResizer thread – Namuk Kim Jun 21 '21 at 06:48
  • @NamukKim, I think Laci, myself and others are making the same point. Mixing OpenMP, pthreads and other libraries/technologies is resulting in unexpected interactions between them. Since you are using IPP, maybe you can refactor around TBB too since Intel makes both. TBB has a [parallel-for](https://software.intel.com/content/www/us/en/develop/documentation/onetbb-documentation/top/onetbb-developer-guide/parallelizing-simple-loops/parallel-for.html) loop algorithm. Otherwise, using pthreads, maybe you can create some basic generic parallel algorithm wrappers such as a parallel-for loop. – Daniel Dearlove Jun 21 '21 at 08:37

2 Answers2

0

First of all, I would like to emphasize that the information (code sample) you provided is not enough to answer the question, but it may be possible to guess what is going on.

I think the only possible explanation is that #pragma omp parallel region in test_code() function can only use one thread (e.g. because no more threads available or your code is inside a nested parallel region and nested parallelism is not allowed) therefore tid is always 0. Moreover, the test_code() (i.e. CResizer::ThreadMain()) was executed by 8 threads, so the printout is repeated 8 times. Based on your function name (ThreadMain) it seems to be a plausible explanation, but as I mentioned without a minimal working example it is not possible to tell for sure. Anyway, I hope it helps to find the problem.

To give you an example, the following code will produce similar output like you sent:

#include <stdio.h>
#include <omp.h>
    
void test_code() {
  omp_set_num_threads(4);
  #pragma omp parallel
  {
    int tid = omp_get_thread_num();

    #pragma omp for
    for (int i = 0; i < 4; i++)
      printf("Hello World %d %d\n", tid, i);
  }
}
int main() {    
    #pragma omp parallel
    test_code();
}
Laci
  • 2,738
  • 1
  • 13
  • 22
  • first of all, thanks for answering my question. before executing test_code(), there is only one CResizer thread. however after execution, it is increased to 8. – Namuk Kim Jun 21 '21 at 06:47
  • It is not clear what you mean. You mean that there is 8 `CResizer` thread, but `test_code()` is only called once? What do you get if you just comment out `#pragma omp parallel` and `#pragma omp for` in `test_code()`? – Laci Jun 21 '21 at 07:29
  • yes, test_code() is called once. and without "omp_set_num_threads(4)", the result is same. it always create 7 more threads if i use openMP – Namuk Kim Jun 21 '21 at 07:52
  • OK. What do you get if you just comment out `#pragma omp parallel` and `#pragma omp for` in `test_code()` and do not change anything else (e.g. use openmp during compilation)? Do you get the expected result or is it printed 7 more times? – Laci Jun 21 '21 at 07:59
  • Based on your answer "it always create 7 more threads if i use openMP", it seems that if openMP is enabled `test_code()` is called 8 times, if openMP is not enabled `test_code()` executed only once. So your "mixed code" must have to contain other openMP directives which affects your code if openMP is turned on. I think `test_code()` is fine, but there are some "hidden" openMP directives somewhere in your program which changes its behavour if openMP is enabled. That's the reason I asked you to comment out openmp directives in `test_code()` and test it when openMP turned on in the compiler. – Laci Jun 21 '21 at 08:21
-1

Looking at the OpenMP Quick Reference Card, you created "a team of OpenMP threads that execute the region." All the OpenMP specifications can be found here.

The output still looks wrong even if you were executing the parallel section multiple times. That is because i is shared across all threads. Also, it is unclear why you are surrounding a parallel section with a std::mutex which identifies this as C++ code. You can also set the number of threads in the parallel section from the #pragma statement. Thus the following is probably sufficient:

#pragma omp parallel
{
    const int tid = omp_get_thread_num();

    #pragma omp for
    for (int i = 0; i < 4; i++)
        printf("Hello World %d %d\n", tid, i);
}
Daniel Dearlove
  • 565
  • 2
  • 12
  • Hi Daniel "is because i is shared across all threads." this is not accurate, the OpenMP standard ensure that i will be private among threads. I have tested the OP's original code and I can't reproduce the problem – dreamcrash Jun 17 '21 at 07:45
  • @dreamcrash, thanks for pointing that out. I looked at the code but I have not executed it. From the code fragment, it looks like `i` is shared because it is declared before the parallel section and not specified in the `#pragma`. If OpenMP duplicates stack variables by default then it is my mistake and I will own it. I will not change my answer so your comment remains in context. – Daniel Dearlove Jun 17 '21 at 09:00
  • Yep, OpenMP will create a private copy of it https://stackoverflow.com/a/67073601/1366871 – dreamcrash Jun 17 '21 at 10:07
  • thanks for answer me. but still the result is same. i think something unknown things seem to be affecting it. – Namuk Kim Jun 18 '21 at 02:16