3

I am installing libgpuarray v0.7.4. When I import theano, I got the following errors: I am using the version of:

(1) Theano from github (rel-1.0.0rc1),

(2) CUDA 9.0 (I am sure CUDA / cusolver works, I installed pycuda, scikit-cuda and could run them successfully)

(3) cuDNN 7.0.3

(4) nvidia driver 384.90

Python 3.5.2 (default, Sep 14 2017, 22:51:06) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano
Using cuDNN version 7003 on context None
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/Theano-1.0.0rc1-py3.5.egg/theano/gpuarray/__init__.py", line 220, in <module>
    use(config.device)
  File "/usr/local/lib/python3.5/dist-packages/Theano-1.0.0rc1-py3.5.egg/theano/gpuarray/__init__.py", line 207, in use
    init_dev(device, preallocate=preallocate)
  File "/usr/local/lib/python3.5/dist-packages/Theano-1.0.0rc1-py3.5.egg/theano/gpuarray/__init__.py", line 152, in init_dev
    pygpu.blas.gemm(0, tmp, tmp, 0, tmp, overwrite_c=True)
  File "pygpu/blas.pyx", line 149, in pygpu.blas.gemm
  File "pygpu/blas.pyx", line 47, in pygpu.blas.pygpu_blas_rgemm
pygpu.gpuarray.GpuArrayException: (b'cuLinkCreate: CUDA_ERROR_JIT_COMPILER_NOT_FOUND: PTX JIT compiler library not found', 3)

Any idea on how to resolve it? I want to make use of the GPU support of theano

EDIT (reply to talonmies)

I think pygpu could create a context over my GPU:

$ DEVICE=cuda python3
Python 3.5.2 (default, Sep 14 2017, 22:51:06) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pygpu
>>> pygpu.test()
pygpu is installed in /usr/local/lib/python3.5/dist-packages/pygpu-0.7.4-py3.5-linux-x86_64.egg/pygpu
NumPy version 1.13.3
NumPy relaxed strides checking option: True
NumPy is installed in /usr/local/lib/python3.5/dist-packages/numpy-1.13.3-py3.5-linux-x86_64.egg/numpy
Python version 3.5.2 (default, Sep 14 2017, 22:51:06) [GCC 5.4.0 20160609]
nose version 1.3.7
*** Testing for GeForce GTX TITAN Black
mpi4py found: False
.EEEEEEEEEEEEEEEEEEEEEEEEEEE

On the other hand, run nvidia-smi

Mon Nov 13 15:39:35 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 00000000:01:00.0 Off |                  N/A |
| 28%   47C    P2    94W / 250W |    102MiB /  6082MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      5562      C   python3                                       91MiB |
+-----------------------------------------------------------------------------+

I traced the code and found that the error appear from BLAS GEMM. I think there is no compiled code of my GPU and libgpuarray needs to compile on my machine. But somehow it cannot find the compiler.

END OF EDIT

wh0
  • 510
  • 1
  • 6
  • 19
  • "Using cuDNN version 7003 on context None" is probably the main clue there. It seems the Theano installation you are using can't even establish a context on the GPU. After that everything will be broken – talonmies Nov 13 '17 at 07:27
  • I think it could. the problem seems libgpuarray (or cublas) is unable to build the code for my GPU – wh0 Nov 13 '17 at 07:43
  • CUBLAS is a statically linked runtime API library, it doesn't use JIT compilation at all. So this is definitely something internal to the Python frameworks you are using, or something strange about the way you built or are running them – talonmies Nov 13 '17 at 07:56
  • 1
    hmm...then I am not sure. I have submitted an issue in github. – wh0 Nov 13 '17 at 08:02

1 Answers1

3

I was experiencing similar issues, but I needed to create symlinks

libnvidia-ptxjitcompiler.so.384.81

to point to

libnvidia-ptxjitcompiler.so.1 libnvidia-ptxjitcompiler.so

Kevin Lee
  • 718
  • 6
  • 19
  • 3
    Those symlink are normally created by *-dev or *-devel packages. If you used packages to install nvidia software, try to install those dev packages too. it is a better fix then adding manually those symlink. – nouiz Jan 11 '18 at 19:09