0

The code below shows problems with with OpenMP tasking in ICL 2021.6.0 and in ICX 2022.1.0 (Clang based) Firstly, I am wondering if I am doing something fundamentally wrong in my OpenMP code and it is just showing up differently when compiled by different compilers. Assuming the code is valid OpenMP... When the function fails_intel_icl() runs under ICL, the task execution is just wrong. Some task are run twice, some not at all. Compiled by ICX/Clang it executes as I expect. When crash_icx_2022() is compiled under ICX it just crashes at runtime. I am testing using Visual Studio 20222/Debug/x64 and latest OneAPI Base and HPC installation.

Examples of incorrect runtime behaviour of the function fails_intel_icl() when compiled with ICL is as follows

Thread:12 launching task for 0,1 <--- you will note the task for pair 0,1 never runs.

Thread:12 launching task for 0,2

Thread:9 Executing task with pair 0,2 ....

#include <iostream>
#include <vector>
#include <omp.h>

std::vector<std::pair<int, std::vector<int>>> data;

void setup()
{
    std::vector<int> tmp({ 1,2,3,4,5 });
    for (int i = 0; i < 5; i++)
    {
        data.push_back({ i,tmp });
    }
}


void DoTask(int a, int b)
{
    {
#pragma omp critical
        std::cout << "Thread:" << omp_get_thread_num() << " Executing task with pair " << a << ',' << b << std::endl;
    }
}
// runs correctly under icl, but crashes at runtime with icx and clang
void crash_icx_2022()
{
#   pragma omp parallel
    {
#   pragma omp single
        {
            for (auto iter = data.begin(); iter != data.end(); ++iter)
            {
                const auto& a = iter->first;
                const auto& b = iter->second;
                for (const auto& aa : b)
                {
                    if (aa != a)
                    {
                        {
#pragma omp critical
                            std::cout << "Thread:" << omp_get_thread_num() << " launching task for " << ' ' << a << ',' << aa << std::endl;
                        }
#   pragma omp task
                        {
                            DoTask(a, aa);
                        }
                    }
                }
            }
        }
    }
}


// this compiles and runs incorrectly under icl but runs correctly with icx or clang

void fails_intel_icl()
{
#   pragma omp parallel
    {
#   pragma omp single
        {
            for (auto iter = data.begin(); iter != data.end(); ++iter)
            {
                const auto a = iter->first;
                const auto b = iter->second;
                for (const auto aa : b)
                {
                    if (aa != a)
                    {
                        {
#pragma omp critical
                            std::cout << "Thread:" << omp_get_thread_num() << " launching task for " << ' ' << a << ',' << aa << std::endl;
                        }
#   pragma omp task
                        {
                            DoTask(a, aa);
                        }
                    }
                }
            }
        }
    }
}


void testTaskingBug()
{
    setup();

    std::cout << "\nStarting test using copies\n" << std::endl;
    fails_intel_icl();
    std::cout << "\nStarting test using references" << std::endl;
    crash_icx_2022();
}
int main()
{
    testTaskingBug();
    return 0;
}

The following C++17 code will not compile under clang. Not sure if the error is real.

void clang_wont_compile()
{
#   pragma omp parallel
    {
#   pragma omp single
        {
            for (const auto& [a, b] : data)
            {
                for (const auto& aa : b)
                {
                    if (aa != a)
                    {
#   pragma omp task
                        DoTask(a, aa);
                    }
                }
            }
        }
    }
}
AndrewC
  • 1
  • 1
  • Can you show us what you expect and what you get instead? – Laci Jun 30 '22 at 06:09
  • It does not crash on [GodBolt using icx or clang](https://godbolt.org/z/9qTE61dsK) – Laci Jun 30 '22 at 06:16
  • You're not entirely up with the latest C++. Instead of the `begin/end` iteration you can use a range-based loop and the `first/second` can be done more elegantly with structured bindings. – Victor Eijkhout Jun 30 '22 at 13:12
  • I've never heard of ICL, but yes, a lot of things have been fixed in in the latest Intel compilers. If you're right and it only works in the 1API compilers, then you've found a bug in the "classic" compiler and Intel will tell you that they have stopped support of those. – Victor Eijkhout Jun 30 '22 at 13:14
  • Running on GoldBolt is not executing parallel code, so of course it runs. – AndrewC Jun 30 '22 at 16:48
  • Why do you think GodBolt not executing parallel code? You can observe 2 threads and if you use `omp_set_num_threads(4)` 4 threads will be used. – Laci Jun 30 '22 at 17:28
  • I also found that sometimes Visual Studio and Intel compiler produce an executable that crashes but runs without a problem using other compilers (or any compiler on linux). – Laci Jun 30 '22 at 19:24
  • Hmm, GodBolt only uses one thread for the first function, but you are right , if I scroll down , I see it uses 2 threads for the second function. That's quite odd. But for realistic tests, this needs to be run on, say, a 16 core 'real' machine. – AndrewC Jun 30 '22 at 20:12
  • I found 3 bugs related to reference/array badly copied in ICC during my thesis 6 years ago. AFAIK only one like this has been fixed since (which is sad for a production compiler use to run large scale scientific application then use to publish research papers). I advise you not to use references (ie. `auto&`) but basic copies. It is not a problem here to copy an int since it is copied anyway. As for `auto& b` it is fine because AFAIK the reference bugs only happen when the references are copied by OpenMP. Your code seems Ok regarding my understanding of the OpenMP/C++ specification. – Jérôme Richard Jun 30 '22 at 23:12
  • I found the clang compiler error is a long standing issue with capturing structured bindings and is supposed to be cleared up by the C++20 standard. It can be worked around quite easily. – AndrewC Jul 01 '22 at 00:43

2 Answers2

1

thanks for pointing this out. It does look like it should be valid OMP code. Maybe something on the backend with the task + critical which is throwing off the compiler and/or if it was not allowed per the spec but doesn’t seem to be the case.

Double checking with some OpenMP folks to see if we have a bug on this (or a better explanation as to the behavior).

TonyM
  • 366
  • 1
  • 4
  • A little follow up. We agree it seems to be valid OMP code as well. We will look into why it’s causing an issue with the compiler. Thanks for raising the issue and coming with a simple example for us. – TonyM Jul 06 '22 at 14:43
  • Just a follow up, we see the clang fix which wasn't pulled into ICX yet but will be. Also there is an LLVM backend issue which we think is resolved here: https://reviews.llvm.org/D122768 so eventually this will be pulled into ICX. Sorry for the headaches and again thanks for the reproducer! – TonyM Aug 08 '22 at 16:57
0

So after more investigation I seem to have answers

  • the OpenMP code is valid and all variations of the functions should
    run correctly

  • icl (intel classic) and icx (clang based) have some bugs as of the versions I have tested with

  • A newer clang compiler I able to test with (14.0.6) has
    resolved the issues and executes correctly.

AndrewC
  • 1
  • 1