GCC 8.1.0/MinGW64-compiled OpenMP program crashes looking for cygwin.s?

Question

I'm learning OpenMP in C++ using gcc 8.1.0 and MinGW64 (latest version as of this month), and I'm running into a weird debug error when my program encounters a segmentation fault.
I know the cause of the crash, attempting to create too many OpenMP threads (50,000), but it's the error itself that has me puzzled. I didn't compile gcc or MinGW64 from source, I just used the installers, and I'm on Windows.
Why is it looking for cygwin.s, and why use that file structure on Windows? My code and the error message from gdb are below the closing.
I'm learning OpenMP in the process of programming a path tracer, and I think I have a workaround for the thread limit (using while (threads < runs) and letting OpenMP set the thread count automatically), but I am stumped as to the error. Is there a workaround or solution for this?
It works fine with ~10,000 threads. I know it's not actually creating 10,000 threads simultaneously, but it's what I was doing before I thought of the workaround.

Thank you for the heads up about rand() and thread safety. I ended up replacing my RNG code with some that appears to be working fine in OpenMP, and it's literally a night and day difference visually. I will try the other changes and report back. Thanks!

WOW! It runs so much faster and the image is artifact-free! Thank you!

Jadan Bliss

Final code:

#pragma omp parellel
for (j = options.height - 1; j >= 0; j--){
    for (i=0; i < options.width; i++) {
            #pragma omp parallel for reduction(Vector3Add:col)
            for (int s=0; s < options.samples; s++)
            {
                float u = (float(i) + scene_drand()) / float(options.width);
                float v = (float(j) + scene_drand()) / float(options.height);
                Ray r = cam.get_ray(u, v); // was: origin, lower_left_corner + u*horizontal + v*vertical);

                col +=  color(r, world, 0);
            }

            col /= real(options.samples);
            render.set(i,j, col);
            col = Vector3(0.0);
    }
}

Error:

Starting program: C:\Users\Jadan\Documents\CBProjects\learnOMP\bin\Debug\learnOMP.exe [New Thread 22136.0x6620] [New Thread 22136.0x80a8] [New Thread 22136.0x8008] [New Thread 22136.0x5428]

Thread 1 received signal SIGSEGV, Segmentation fault. ___chkstk_ms () at ../../../../../src/gcc-8.1.0/libgcc/config/i386/cygwin.S:126 126
../../../../../src/gcc-8.1.0/libgcc/config/i386/cygwin.S: No such file or directory.

Alain Merigot · Accepted Answer · 2019-01-24T10:21:49.683

Here are some remarks on your code.

Using a huge number of thread will not bring you any gain and is the probable reason of your problems. Thread creation has a time and resource cost. Time cost makes that it will probably be the main time in your program and your parallel program will be by far longer than its sequential version. Concerning resource cost, each thread has its own stack segment. Its size is system dependent, but typical values are measured in MB. I do not know the characteristics of your system, but with 100000 threads, this is probably the reason why your code is crashing. I have no explaination for the message about about cygwin.s, but after a stack overflow, the behavior can be weird.

Threads are a mean to parralelize code, and, for data parallelism, it is most of the time useless to have more threads than the number of logical processors on your system. Let openmp set it, but you can experiment later to tune this number.

Besides that, there are other problems.

rand() is not thread safe as it uses a global state that will be modified concurrently by threads. rand_r() is, as the state of the random generator is not global and can be stored in every thread.

You should not modify a shared var like result without an atomic access as concurrent thread accesses can lead to unexpected results. While safe, using an atomic modification for every value is not a very efficient solution, though. Atomic accesses are very expensive and it is better to use a reduction that does local accumulation in every thread and a unique atomic access at the end.

#include <omp.h>
#include <iostream>
#include <random>
#include <time.h>

int main()
{
    int runs = 100000;
    double result = 0.0;
#pragma omp parallel
    {
      // per thread initialisation of rand_r seed.
      unsigned int rand_state=omp_get_thread_num()*time(NULL);
                     // or whatever thread dependent seed
#pragma omp for reduction(+:result)
      for(int i=0; i<runs; i++) 
        {
          double d = double(rand_r(&rand_state))/double(RAND_MAX);
          result += d;
        }
    }
    result /= double(runs);
    std::cout << "The computed average over " << runs << " runs was " 
           << result << std::endl;
    return 0;
}

GCC 8.1.0/MinGW64-compiled OpenMP program crashes looking for cygwin.s?

1 Answers1