4

I am training my neural network using tensorflow on CentOS HPC. However I got this error at start of the training process:

OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized. OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

The code is for instance segmentation and it worked fine for many people, but failed in my case.

Why it occurs? How to solve it?

Kunyu Shi
  • 197
  • 3
  • 10
  • Try to read the error message, both of your questions are answered there. – Taku Apr 17 '18 at 14:27
  • 2
    @abccd That message clearly says that solution is an unsafe, unsupported, undocumented workaround. – Kunyu Shi Apr 18 '18 at 05:53
  • 2
    Read it carefully. "The best thing to do is to ensure that only a single OpenMP runtime is linked into the process". You have to state clearly what you're running and what's causing the error. Stating that the code works for others but not for you wouldn't help anyone help you. All anyone can tell you right now is to use the unsafe workaround or run a single OpenMP runtime at a time. – Taku Apr 18 '18 at 06:19
  • @abccd You are right, I should describe more details. I've solved this problem. Thank you. – Kunyu Shi Apr 18 '18 at 07:03

4 Answers4

8

I had a similar issue on macOS with the same error message (see this question) and found the following reasons:

Problem:

I had a conda environment where Numpy, SciPy and TensorFlow were installed.

Conda is using Intel(R) MKL Optimizations, see docs:

Anaconda has packaged MKL-powered binary versions of some of the most popular numerical/scientific Python libraries into MKL Optimizations for improved performance.

The Intel MKL functions (e.g. FFT, LAPACK, BLAS) are threaded with the OpenMP technology.

But on macOS you do not need MKL, because the Accelerate Framework comes with its own optimization algorithms and already uses OpenMP. That is the reason for the error message: OMP Error #15: ...

Workaround:

You should install all packages without MKL support:

conda install nomkl

and then use

conda install numpy scipy pandas tensorflow

followed by

conda remove mkl mkl-service

For more information see conda MKL Optimizations.

J.E.K
  • 1,321
  • 10
  • 17
  • This worked well for me. I just created a new ML environment starting with the above steps. My problem was that I could not use matplotlib plots to plot training data in the same script after the model finished training. – soporific312 May 08 '20 at 22:12
5

I solved this problem by asking a HPC server expert. Maybe useful for Compute Canada system users.

Why it occurs?

This error is due to conflict between a tensorflow pre-built Python wheel(which is specific for Compute Canada system) and conda environment. Quote : "conda is always a bit problematic because it downloads precompiled binaries, mileage may vary..."

How to solve it?

As @abccd pointed out "The best thing to do is to ensure that only a single OpenMP runtime is linked into the process". However, I haven't figured out how to ensure that.

So I uninstalled conda, and install everything in module system using pip install. Then the network works fine.

Kunyu Shi
  • 197
  • 3
  • 10
4

I solved, as explained by the message, by adding:

import os    
os.environ['KMP_DUPLICATE_LIB_OK']='True'
rsc
  • 10,348
  • 5
  • 39
  • 36
0

Simply downgrading my version of TensorFlow using Anaconda did it for me.

Petio Petrov
  • 123
  • 1
  • 9