0

I have a scientific C++ application that is parallelized with OpenMP and compiled typically with GCC/8.2.0. The application further depends on gsl and fftw, the latter using OpenMP as well. The application uses a C API to access a Fortran library that is parallelized with OpenMP as well and can use either Intel's MKL or openblas as backend. Compilation of the library is preferred using the Intel/19.1.0 toolchain. I have successfully compiled, linked, and tested everything using GCC/8.2.0 and openblas (as base line). However, test studies on minimal examples suggest MKL with Intel would be faster and speed is important for my use case.

icc --version gives me: icc (ICC) 19.1.0.166 20191121; operating system is CentOS 7. Bear in mind I'm on a cluster and have limited control on what I can install. Software is centrally managed using spack and environments are loaded by specification of a compiler layer (only one at a time).

I have considered different approaches how to get the Intel/MKL library into my code:

  1. Compile C++ and Fortran code using the Intel toolchain. While that's probably the tidiest solution, the compiler throws "internal error: 20000_7001" for a particular file with a OMP include. I could not find documentation for that particular error code and have not gotten feedback from Intel either (https://community.intel.com/t5/Intel-C-Compiler/Compilation-error-internal-error-20000-7001/m-p/1365208#M39785). I allocated > 80 GB of memory for compilation as I have experienced the compiler break down before when limited resources were available. Maybe someone here has seen that error code?

  2. Compile C++ and Fortran code with GCC/8.2.0 but link dynamically to Intel compiled MKL as backend for the Fortran library. I managed to do that from the GCC/8.2.0 layer and extension of LIBRARY_PATH and LD_LIBRARY_PATH to where MKL lives on the cluster. It seems like only GNU OMP is linked and MKL was found. Analysis shows that CPU load is quite low (but higher than the binary with the GCC/8.2.0 + openblas set-up). Execution time of my program is improved by ~30%. However, I got this runtime error in at least one case when I run the binary with 20 cores: libgomp: Thread creation failed: Resource temporarily unavailable.

  3. Sticking with GCC/8.2.0 for my C++ code and linking dynamically to the precompiled Fortran library that was compiled itself with Intel/MKL using Intel OMP. This approach turned out to be tricky. As with approach (2), I loaded the GCC environment and manually expanded LD_LIBRARY_PATH. A minimal example that is not OMP parallelized itself worked beautifully out of the box. However, even though I managed to compile and link my C++ program as well, I got an immediate runtime error once the OMP call in the Fortran library occurs.

Here is the output of ldd of the compiled C++ code:

linux-vdso.so.1 => (0x00007fff2d7bb000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ab227c25000)
libgsl.so.25 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgsl.so.25 (0x00002ab227e41000)
libgslcblas.so.0 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgslcblas.so.0 (0x00002ab228337000)
libfftw3.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3.so.3 (0x00002ab228595000)
libz.so.1 => /lib64/libz.so.1 (0x00002ab228a36000)
libfftw3_omp.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3_omp.so.3 (0x00002ab228c4c000)
libxtb.so.6 => /cluster/project/igc/iridiumcc/intel-19.1.0/xtb/build/libxtb.so.6 (0x00002ab228e53000)
libstdc++.so.6 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libstdc++.so.6 (0x00002ab22a16d000)
libm.so.6 => /lib64/libm.so.6 (0x00002ab22a4f1000)
libgomp.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgomp.so.1 (0x00002ab22a7f3000)
libgcc_s.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgcc_s.so.1 (0x00002ab22aa21000)
libc.so.6 => /lib64/libc.so.6 (0x00002ab22ac39000)
/lib64/ld-linux-x86-64.so.2 (0x00002ab227a01000)
libmkl_intel_lp64.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002ab22b007000)
libmkl_intel_thread.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_thread.so (0x00002ab22bb73000)
libmkl_core.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_core.so (0x00002ab22e0df000)
libifcore.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libifcore.so.5 (0x00002ab2323ff000)
libimf.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libimf.so (0x00002ab232763000)
libsvml.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libsvml.so (0x00002ab232d01000)
libirng.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libirng.so (0x00002ab234688000)
libiomp5.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libiomp5.so (0x00002ab2349f2000)
libintlc.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libintlc.so.5 (0x00002ab234de2000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002ab235059000)

I did some research and found interesting discussions here and at Intel's documentation regarding crashes with two different OMP implementations:

Telling GCC to *not* link libgomp so it links libiomp5 instead https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/optimization-and-programming-guide/openmp-support/openmp-library-support/using-the-openmp-libraries.html http://www.nacad.ufrj.br/online/intel/Documentation/en_US/compiler_c/main_cls/optaps/common/optaps_par_compat_libs_using.htm

I followed the guidelines provided for the Intel OpenMP compatibility libraries. Compilation of my C++ code was done from the GCC environment using the -fopenmp flag as always. During the linking stage (g++), I took the same linker command I usually take but replaced -fopenmp by -L/cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64 -liomp5 -lpthread. The resulting binary runs like a charm and is roughly twice as fast as my original built (GCC/openblas).

Here is the output of ldd of the compiled C++ code:

linux-vdso.so.1 =>  (0x00007ffd7eb9a000)
libiomp5.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libiomp5.so (0x00002b4fb08da000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b4fb0cca000)
libgsl.so.25 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgsl.so.25 (0x00002b4fb0ee6000)
libgslcblas.so.0 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgslcblas.so.0 (0x00002b4fb13dc000)
libfftw3.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3.so.3 (0x00002b4fb163a000)
libz.so.1 => /lib64/libz.so.1 (0x00002b4fb1adb000)
libfftw3_omp.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3_omp.so.3 (0x00002b4fb1cf1000)
libxtb.so.6 => /cluster/project/igc/iridiumcc/intel-19.1.0/xtb/build/libxtb.so.6 (0x00002b4fb1ef8000)
libstdc++.so.6 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libstdc++.so.6 (0x00002b4fb3212000)
libm.so.6 => /lib64/libm.so.6 (0x00002b4fb3596000)
libgcc_s.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgcc_s.so.1 (0x00002b4fb3898000)
libc.so.6 => /lib64/libc.so.6 (0x00002b4fb3ab0000)
/lib64/ld-linux-x86-64.so.2 (0x00002b4fb06b6000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b4fb3e7e000)
libgomp.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgomp.so.1 (0x00002b4fb4082000)
libmkl_intel_lp64.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002b4fb42b0000)
libmkl_intel_thread.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_thread.so (0x00002b4fb4e1c000)
libmkl_core.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_core.so (0x00002b4fb7388000)
libifcore.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libifcore.so.5 (0x00002b4fbb6a8000)
libimf.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libimf.so (0x00002b4fbba0c000)
libsvml.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libsvml.so (0x00002b4fbbfaa000)
libirng.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libirng.so (0x00002b4fbd931000)
libintlc.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libintlc.so.5 (0x00002b4fbdc9b000)

Unlike in approach (2), the binary is linked against both libiomp5 and libgomp. I suspect that I get references to libgomp because I link to libfftw3_omp, which was compiled with GCC/8.2.0. I find it quite puzzing that ldd seems to give the exact same links as for my first attempt with approach (3), only the order seems to have changed (libiomp5 before libgomp).

While I am quite happy to have gotten a working binary in the end, I have some questions I could not resolve by myself:

  • do you interpret Intel's documentation and the previous SO post like I do and agree that the Intel OpenMP compatibility libraries are applicable in my case and that I have used the correct workflow? Or do you think approach (3) is a recipe for disaster in the future?

  • does any of you have more experience with Intel's C++ compiler and has seen the error code described in approach (1)? (see update below)

  • do you think it's worth investigating whether I can completely get rid of libgomp by, for example, manually linking to Intel compiled libfftw3_omp that only depends on libiomp5? (see update below)

  • do you have an explanation why thread creation fails in some cases using approach (2)?

Thank you very much in advance!

// Update: In the meantime I managed to tweak approach (3) by not linking against GCC/8.2.0 compiled gsl and fftw but used instead Intel/19.1.0 compiled gsl and fftw. The resulting binary is similar in speed compared to what I have gotten before, however, links only to libiomp5.so, which seems like the cleaner solution to me.

// Update: Manual exclusion of compiler optimizations for files that throw internal errors from CMakeLists.txt (CMake: how to disable optimization of a single *.cpp file?) gave me a working binary, however, with linker warnings.

iridiumcc
  • 41
  • 6
  • Regarding the internal error it would be interesting to see if it happens with a different version of the compiler. Most clusters either run a module system, allowing you to choose from several versions, or at the very least support some sort of containers (or of course both). I would recommend trying the newer version in case this is some sort of bug with the version you have been using thus far. – Qubit Mar 05 '22 at 20:56
  • You're right, we do have different versions of icc. Version from 2018 gives the same error and the new 2022 LLVM based version does not quite have all libraries installed on our cluster. – iridiumcc Mar 05 '22 at 21:06
  • What do you mean "does not quite have all libraries installed on our cluster"? If the compiler is there, what is missing? – Qubit Mar 05 '22 at 21:26
  • Note that GCC and ICC use different OpenMP runtime that are not really compatible. ICC/Clang are based on IOMP and GCC on GOMP. AFAIK, ICC/Clang experimentally support GOMP but GCC does not support IOMP. Generally, it is better to compiler everything with one compiler. It is fine if a library use a different runtime as long as its code is not mixed with the one of other library using another runtime (eg. callbacks called in a parallel region or nested parallel regions). – Jérôme Richard Mar 05 '22 at 21:58
  • @Qubit: the cluster support has not installed gsl and fftw under this new compiler layer. However, I linked against Intel/19.1.0 compiled binaries and had trouble with the same file. I will try to take OMP parallelization out of this file and try again. – iridiumcc Mar 06 '22 at 10:03
  • @JeromeRichard: Yes, that is exactly what I am trying to solve with my issue. That is why I was referring to the Intel OpenMP compatibility libraries. – iridiumcc Mar 06 '22 at 10:03
  • For the "internal error": lower the optimization level. That often/sometimes helps. – Victor Eijkhout Mar 06 '22 at 11:17
  • @iridiumcc As far as I recall MKL comes with fftw, as for gsl, you can always download it into your home directory (or whatever directory is intended for that on your cluster) and build it there, then link against that. – Qubit Mar 06 '22 at 12:05
  • To follow up on Victor's suggestion: I can compile with Intel 19 / Intel 22 using Debug mode. However, I still have trouble with one file containing the following OMP code: int size = 1; #ifdef OMP int tid; #pragma omp parallel private(tid) { tid = omp_get_thread_num(); if (tid == 0){ size = omp_get_num_threads(); } } #endif Even after successful compilation, I get these linker warnings. Does that seem like a type of ABI problem to you? – iridiumcc Mar 06 '22 at 17:14
  • ld: Warning: size of symbol `__bid_ten2mxtrunc192' changed from 1792 in /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libdecimal.a(bid128.o) to 1344 in /usr/lib/gcc/x86_64-redhat-linux/4.8.5//libgcc.a(bid128.o) [...] ld: Warning: size of symbol `__bid_midpoint192' changed from 640 in /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libdecimal.a(bid128.o) to 480 in /usr/lib/gcc/x86_64-redhat-linux/4.8.5//libgcc.a(bid128.o) – iridiumcc Mar 06 '22 at 17:17
  • @Qubit I managed to compile my C++ code both with Intel/19.1.0 and Intel/2022 after I manually excluded the two files that give me trouble from my CMakeLists.txt (https://stackoverflow.com/questions/33540578/cmake-how-to-disable-optimization-of-a-single-cpp-file). For Intel/2022, I manually linked against gsl and fftw that were compiled with Intel/19.1.0. I am currently running benchmarks which compiler setting will give best performance. However, I am still puzzled by the linker errors and not sure whether solution (3) is not preferred, given I don't get linker errors. – iridiumcc Mar 06 '22 at 20:42
  • The linker error seems like you link against two libraries that provide the same symbols (that is, the same name but apparently not the same size). Whether or not this ends up causing problems with the linked application however, I'm not sure. If it consistently links all instances to one of the two, it could even be fine. Probably best to fix the warning or check that everything is fine though. – Qubit Mar 07 '22 at 07:37

0 Answers0