I have these two files - "shared.c" and "test1.c"
/* shared.c */
#include "stdio.h"
#include "string.h"
#include "omp.h"
int ompfunc1(void){
int count = 0;
#pragma omp parallel num_threads(4)
{
#pragma omp atomic
count++;
}
printf("Inside ompfunc1 in .so: ");
return count;
}
/* test1.c*/
#include "stdlib.h"
#include "stdio.h"
#include "dlfcn.h"
int main(int argc, char **argv) {
void *handle;
int (*str3)(void);
char *error;
int k;
handle = dlopen ("./libshared.so", RTLD_LAZY);
if (!handle) {
fputs (dlerror(), stderr);
exit(1);
}
str3 = dlsym(handle, "ompfunc1");
if ((error = dlerror()) != NULL) {
fputs(error, stderr);
exit(1);
}
printf("\n\tcount from omp lib = %d\n", (*str3)());
dlclose(handle);
return 0;
}
My Makefile is
all: test1 libshared.so
test1: test1.c
gcc -o test1 test1.c -ldl
libshared.so: shared.o
gcc -shared -o libshared.so shared.o -lgomp
shared.o: shared.c
gcc -c -g -Wall -Werror -fPIC -fopenmp shared.c
Now, ./test1
is getting a segmentation fault at the very end. The reason probably is that there is a problem with dlclose()
and OpenMP calls used with gcc
or gnu build tool.
Note that,
- This segfault may not occur every time
- It may not occur in linux but with
gdb
you can see it there - It happens in wsl system if you just run it
- The general error would look like below
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Inside ompfunc1 in .so:
count from omp lib = 4
Thread 3 "test1" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffffdf70700 (LWP 4720)]
0x00007ffffe9c8f22 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
(gdb) info threads
Id Target Id Frame
1 Thread 0x7fffff7d0740 (LWP 4715) "test1" _dl_close_worker (map=map@entry=0x8402280, force=force@entry=false) at dl-close.c:794
2 Thread 0x7ffffe780700 (LWP 4719) "test1" 0x00007ffffe9c8f22 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
* 3 Thread 0x7ffffdf70700 (LWP 4720) "test1" 0x00007ffffe9c8f22 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
4 Thread 0x7ffffd760700 (LWP 4721) "test1" 0x00007ffffe9c8f22 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
As one can see that, there is something weird about those threads as they do not have possibly any context associated with them (shown by ??). So main thread is getting closed by dlclose()
before other spawned threads are closed or I am not sure what is exactly happening.
Some references can be found here Destroying threads in Openmp (C++)
I have done a few things to experiment both in linux and using wsl
in windows 10
a) In Linux system (Ubuntu)
- used a
fork
system call and do all the work in child process and in parent code I wait usingwait(NULL)
to keep my main thread running as shown below
#include "stdlib.h"
#include "stdio.h"
#include "dlfcn.h"
#include "omp.h"
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
int main(int argc, char **argv) {
pid_t pid = fork();
if(pid == 0) {
void *handle;
int (*str3)(void);
char *error;
printf("Child => PPID: %d PID: %d\n", getppid(), getpid());
handle = dlopen ("./libshared.so", RTLD_NOW);
if (!handle) {
fputs (dlerror(), stderr);
exit(1);
}
str3 = dlsym(handle, "ompfunc1");
if ((error = dlerror()) != NULL) {
fputs(error, stderr);
exit(1);
}
printf("\n\tcount from omp lib = %d\n", (*str3)());
dlclose(handle);
exit(EXIT_SUCCESS);
}
else if(pid > 0) {
printf("Parent => PID: %d\n", getpid());
printf("Waiting for child process to finish.\n");
wait(NULL);
printf("Child process finished.\n");
}
printf("parent and child are finishing ... \n");
return 0;
}
This is working! But unfortunately there is some technical problem so I cannot really adopt this solution in our real project.
I used a
sleep(1)
system call beforedlclose()
in test1.c and that is also working. I do not know why this is actually working. I also tried withchrono
library and used 100 millisec as sleep time and that is working too. Anything below is not working. Again not sure why exactly. One thing could be the process state is going to sleep where it may be uninterruptible sleep state to handle signal? I would like to know more.I used intel compiler (
oneAPI
) to compile (see below)
icc -fPIC -qopenmp -c shared.c
icc -shared -o libshared.so shared.o
icc -o test1 test1.c -ldl -qopenmp
and that is working fine with gdb-oneapi
(see below)
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Inside ompfunc1 in .so:
count from omp lib = 4
[Thread 0x7ffff2a74880 (LWP 121684) exited]
[Thread 0x7ffff3276800 (LWP 121683) exited]
[Thread 0x7ffff3a78780 (LWP 121682) exited]
[Inferior 1 (process 121678) exited normally]
b) wsl (v1) in windows 10.
- This gives segfault right away without even using
gdb
- We also debugged through visual studio IDE (2019) and when we put a breakpoint at
sleep(1)
and execute line by line mode it also getting segfault.
As of now, I am not fully understanding all the outputs and want to know more details about what is happening. It could be a result due to a combination of gnu toolchains, OpenMP and dlclose(). I have found some references for gcc problem here https://forum.openmp.org/viewtopic.php?f=3&t=552&hilit=dlclose
By the way a hacky solution is just not use the last call of dlclose()
while loading a dummy empty .so file. But that's kind of non-elegant solution.
Anyway, I would appreciate if someone could explain these observations and guide me to an elegant solution for this problem.
Thank you all for your help in advance!
I did further investigation;
So I downloaded AOCC 3.1 compiler and build my libshared.so file with AMD's OpenMP implementation. I linked that with my test1
and it is running fine.
Inside ompfunc1 in .so:
count from omp lib = 4
[Thread 0x7ffff5388880 (LWP 140265) exited]
[Thread 0x7ffff5b8a800 (LWP 140264) exited]
[Thread 0x7ffff638c780 (LWP 140263) exited]
[Inferior 1 (process 140262) exited normally]
And we can see the that we are using /opt/AMD/aocc-compiler-3.1.0/lib/libomp.so
from AOCC 3.1
$ ldd libshared.so
linux-vdso.so.1 (0x00007ffc6d9fe000)
libomp.so => /opt/AMD/aocc-compiler-3.1.0/lib/libomp.so (0x00007fdfe6f27000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdfe6b36000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fdfe6917000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fdfe670f000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fdfe650b000)
/lib64/ld-linux-x86-64.so.2 (0x00007fdfe7405000)
So my understanding is there is a problem in gnu OpenMP library.
To make it clear, let's look at the output below when we use OpenMP implementation for gcc
azureuser@test:~/OpenMP_test$ ldd libshared.so
linux-vdso.so.1 (0x00007ffd5998f000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f55fc596000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f55fc1a5000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f55fbfa1000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f55fbd82000)
/lib64/ld-linux-x86-64.so.2 (0x00007f55fc9c7000)
azureuser@AzureSolaraTestHB:~/OpenMP_test$ gdb ./test1
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Reading symbols from ./test1...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/azureuser/OpenMP_test/test1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff718d700 (LWP 39597)]
[New Thread 0x7ffff698c700 (LWP 39598)]
[New Thread 0x7ffff618b700 (LWP 39599)]
Inside ompfunc1 in .so:
count from omp lib = 4
Thread 3 "test1" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff698c700 (LWP 39598)]
0x00007ffff73c5f22 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
(gdb) i threads
Id Target Id Frame
1 Thread 0x7ffff7fe9740 (LWP 39593) "test1" _dl_close_worker (map=map@entry=0x555555756280, force=force@entry=false) at dl-close.c:794
2 Thread 0x7ffff718d700 (LWP 39597) "test1" 0x00007ffff73c5f26 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
* 3 Thread 0x7ffff698c700 (LWP 39598) "test1" 0x00007ffff73c5f22 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
4 Thread 0x7ffff618b700 (LWP 39599) "test1" 0x00007ffff73c5f22 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
(gdb)
and let's also look at the output below when we use OpenMP implementation for AOCC 3.1
azureuser@test:~/OpenMP_test$ ldd libshared.so
linux-vdso.so.1 (0x00007ffc6e1f6000)
libomp.so => /opt/AMD/aocc-compiler-3.1.0/lib/libomp.so (0x00007f3497c1c000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f349782b000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f349760c000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f3497404000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3497200000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3497ef8000)
azureuser@AzureSolaraTestHB:~/OpenMP_test$ gdb ./test1
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Reading symbols from ./test1...done.
(gdb) r
Starting program: /home/azureuser/OpenMP_test/test1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff658e780 (LWP 39545)]
[New Thread 0x7ffff5d8c800 (LWP 39546)]
[New Thread 0x7ffff558a880 (LWP 39547)]
Inside ompfunc1 in .so:
count from omp lib = 4
[Thread 0x7ffff558a880 (LWP 39547) exited]
[Thread 0x7ffff5d8c800 (LWP 39546) exited]
[Thread 0x7ffff658e780 (LWP 39545) exited]
[Inferior 1 (process 39541) exited normally]```