0

Lately, I work a lot with Tensorflow and Anaconda environments. I've found out that creating a new python environment for each project is a great practice for avoiding contamination and leaking when it comes to libraries' versions and dependency-satisfying. I was ignoring such practices with environmental variables in Linux and I believe my current problem is related to messing up with a few of them.

My question up-front: What is a good practice to create a completely isolated coding environment in terms of python packages and environmental variables in such a way that the same variable, say LD_LIBRARY_PATH, will contain paths that are relevant to the current project?

my current enigma is:

Anaconda: I have 2 Anaconda environments, say env_a and env_b, that have identical tensorflow-gpu packages installed inside. when I execute tf.test.is_gpu_available() on env_a, it recognizes my GPU with a series of Info massages I brought down below. on env_b it fails to recognize the gpu after a series of failures loading dynamic libraries. I brought the Warning/Info massages down below as well.

Environmental paths: I added paths to LD_LIBRARY_PATH in the /.bashrc file. So each time I start a terminal it is loaded with them.

The reason I mention LD_LIBRARY_PATH is that it appears on Warning massages when env_b fails to recognize the gpu. Apparently LD_LIBRARY_PATH works well with env_a and works badly with env_b.

Fialure massage of env_b

2019-09-29 16:35:29.702531: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-09-29 16:35:29.703074: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557bcdfb6460 executing computations on platform Host. Devices:
2019-09-29 16:35:29.703089: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-09-29 16:35:29.703835: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-09-29 16:35:29.725176: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-29 16:35:29.725495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1070 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2019-09-29 16:35:29.725567: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries:/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries
2019-09-29 16:35:29.725613: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries:/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries
2019-09-29 16:35:29.725657: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries:/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries
2019-09-29 16:35:29.725700: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries:/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries
2019-09-29 16:35:29.725742: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries:/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries
2019-09-29 16:35:29.725785: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries:/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries
2019-09-29 16:35:29.725825: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries:/home/asheryartsev/UnrealEngine-4.18/Engine/Binaries
2019-09-29 16:35:29.725833: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-09-29 16:35:29.788211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-29 16:35:29.788236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-29 16:35:29.788241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-29 16:35:29.789433: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-29 16:35:29.789753: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557bcfea7e30 executing computations on platform CUDA. Devices:
2019-09-29 16:35:29.789765: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1070 Ti, Compute Capability 6.1
False

success massages of env_a

2019-09-30 11:13:32.469410: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-09-30 11:13:32.498599: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-09-30 11:13:32.499157: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a5de3541f0 executing computations on platform Host. Devices:
2019-09-30 11:13:32.499185: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-09-30 11:13:32.500030: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-09-30 11:13:32.502295: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-30 11:13:32.502622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1070 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2019-09-30 11:13:32.502735: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2019-09-30 11:13:32.503876: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2019-09-30 11:13:32.505077: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10
2019-09-30 11:13:32.505250: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10
2019-09-30 11:13:32.506538: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10
2019-09-30 11:13:32.507201: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10
2019-09-30 11:13:32.509645: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-30 11:13:32.509724: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-30 11:13:32.510038: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-30 11:13:32.510289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-30 11:13:32.510314: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2019-09-30 11:13:32.567936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-30 11:13:32.567961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-09-30 11:13:32.567966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-09-30 11:13:32.568099: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-30 11:13:32.568419: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-30 11:13:32.568706: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-30 11:13:32.568978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 6822 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-09-30 11:13:32.570348: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a5e1a6e070 executing computations on platform CUDA. Devices:
2019-09-30 11:13:32.570365: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1070 Ti, Compute Capability 6.1
True
Asher Yartsev
  • 77
  • 3
  • 9
  • 1
    You can set env vars on a per-env basis by following these instructions. That way certain env vars will be set whenever you activate a conda env, and removed again when the conda env gets deactivated. https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#macos-and-linux – orangeInk Sep 30 '19 at 13:41
  • @orangeInk wow, it looks like what I was looking for! but, in the instructions the files activate.d/deactivate.d don't seem to be connected to a specific environment, judging by the path, what do I miss? – Asher Yartsev Sep 30 '19 at 16:13
  • 1
    You create those files/folders inside one of your conda environment folders and those settings will only apply to that env. Someone wrote up a more complete answer here: https://stackoverflow.com/a/46833531/6793245 – orangeInk Oct 01 '19 at 06:46

0 Answers0