0

I have a 1080Ti GPU with CUDA 10.2, NVIDIA driver 440.59 and pynvml version 11.4.1 running on Ubuntu 16.04.

1. I am trying to profile my PyTorch code using scalene. When I run my code as scalene main.py I get the following error:

Error in program being profiled:
 Function Not Found
Traceback (most recent call last):
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/scalene/scalene_profiler.py", line 1612, in profile_code
    exec(code, the_globals, the_locals)
  File "./code-exp/main.py", line 1, in <module>
    import numpy as np
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/numpy/__init__.py", line 140, in <module>
    from . import core
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/numpy/core/__init__.py", line 22, in <module>
    from . import multiarray
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/numpy/core/multiarray.py", line 12, in <module>
    from . import overrides
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/numpy/core/overrides.py", line 9, in <module>
    from numpy.compat._inspect import getargspec
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/numpy/compat/__init__.py", line 14, in <module>
    from .py3k import *
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/scalene/scalene_profiler.py", line 719, in cpu_signal_handler
    (gpu_load, gpu_mem_used) = Scalene.__gpu.get_stats()
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/scalene/scalene_gpu.py", line 110, in get_stats
    mem_used = self.gpu_memory_usage(self.__pid)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/scalene/scalene_gpu.py", line 101, in gpu_memory_usage
    for proc in pynvml.nvmlDeviceGetComputeRunningProcesses(handle):
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2223, in nvmlDeviceGetComputeRunningProcesses
    return nvmlDeviceGetComputeRunningProcesses_v2(handle);
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

To validate whether or not this issue is coming from scalene library, I run the following commands:


>>> from pynvml import *
>>> nvmlInit()
>>> nvmlSystemGetDriverVersion()
b'440.59'
>>> handle = nvmlDeviceGetHandleByIndex(0)
>>> nvmlDeviceGetComputeRunningProcesses(handle)
Traceback (most recent call last):
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2223, in nvmlDeviceGetComputeRunningProcesses
    return nvmlDeviceGetComputeRunningProcesses_v2(handle);
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
>>> nvmlDeviceGetGraphicsRunningProcesses(handle)
Traceback (most recent call last):
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetGraphicsRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2260, in nvmlDeviceGetGraphicsRunningProcesses
    return nvmlDeviceGetGraphicsRunningProcesses_v2(handle)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2228, in nvmlDeviceGetGraphicsRunningProcesses_v2
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetGraphicsRunningProcesses_v2")
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
>>> list(map(str, nvmlDeviceGetGraphicsRunningProcesses(handle)))
Traceback (most recent call last):
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetGraphicsRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2260, in nvmlDeviceGetGraphicsRunningProcesses
    return nvmlDeviceGetGraphicsRunningProcesses_v2(handle)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2228, in nvmlDeviceGetGraphicsRunningProcesses_v2
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetGraphicsRunningProcesses_v2")
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
>>> nvmlDeviceGetComputeRunningProcesses_v2(handle)
Traceback (most recent call last):
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
    _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
    fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
  File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
    raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

This error seems to be coming from pynvml. I am not sure why this is the case.

Following is the nvidia-smi output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 23%   60C    P0    64W / 250W |      0MiB / 11177MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 24%   60C    P0    63W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:83:00.0 Off |                  N/A |
| 23%   58C    P0    57W / 250W |      0MiB / 11178MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
user529295
  • 189
  • 2
  • 11
  • update your GPU driver to the latest. pynvml 11.4.1 [expects](https://pypi.org/project/pynvml/#description) a driver install that is consistent with CUDA 11.4 – Robert Crovella Sep 03 '22 at 15:13
  • Is there a mapping between pynvml version and CUDA driver version I can refer? To reproduce PyTorch rmodel results, the constraint is to use CUDA 10.2. – user529295 Sep 04 '22 at 17:45
  • Is there a problem with just updating your GPU driver to the latest one? Anyhow, the relationship between the minimum driver version and the CUDA version is listed in the CUDA toolkit [release notes](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html). You should refer to table 3 not table 2. Perhaps you may also want to study [this](https://stackoverflow.com/questions/53422407/different-cuda-versions-shown-by-nvcc-and-nvidia-smi) so that you can learn that CUDA 10.2 can run with any driver that advertises 10.2 **or newer**. – Robert Crovella Sep 04 '22 at 21:12

0 Answers0