- I am using this Amazon Linux 2 AMI with NVIDIA TESLA GPU Driver
$ cat /etc/*-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
Amazon Linux release 2 (Karoo)
- It came out of the box with cuda 9.2 (nvcc --version) and driver R450
- I am using cmake 3.26 to create a shared library (.so) that has all of my kernels.
- Now, I have been trying to debug with its corresponding cuda-gdb but it seems it lacks some functionality to stop in my kernel breakpoints:
NVIDIA (R) CUDA Debugger
9.2 release
Portions Copyright (C) 2007-2018 NVIDIA Corporation
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
...
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
0x00007fc5a6a183dc in select () from /lib64/libc.so.6
I tried:
I could not find an AMI with a newer CUDA Toolkit, so I installed the cuda toolkit 12.2 (latest now) and avoided the driver. I saw in the official compatibility table that cuda 12 may run using my current driver:
$ nvidia-smi
Thu Jul 20 20:29:10 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.216.04 Driver Version: 450.216.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 38C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
ok. I rebuilt my project and when I try to debug I see that I effectively need the new driver
NVIDIA (R) CUDA Debugger
CUDA Toolkit 12.2 release
Portions Copyright (C) 2007-2023 NVIDIA Corporation
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Using python library libpython3.7m.so
No source file named AAA.cpp.
No source file named AAA.cpp.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Detaching after fork from child process 5732]
Incompatible CUDA driver version. Expected 12.2.134 or later and found 8.0.129 instead.&"fatal: Incompatible CUDA driver version. (error code = CUDBG_ERROR_INCOMPATIBLE_API(0x13)\n"
While installing again the CUDA Toolkit (now selecting the driver) I see the log:
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /bin/gcc
[INFO]: gcc version: gcc version 7.3.1 20180712 (Red Hat 7.3.1-15) (GCC)
[INFO]: Initializing menu
[INFO]: nvidia-fs.setKOVersion(2.16.1)
[INFO]: Setup complete
[INFO]: Installing: Driver
[INFO]: Installing: 535.54.03
[INFO]: Executing NVIDIA-Linux-x86_64-535.54.03.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 3840
[ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details.
[ERROR]: Install of 535.54.03 failed, quitting
later, /var/log/nvidia-installer.log is empty.
I suspect some protection from the Amazon distro.
question: how may I debug my code? I am stuck at installing the updated driver (to get newer functionality), but maybe it is unnecessary.
PD:
- vscode debug launch
{
"name": "mytests",
"type": "cuda-gdb",
"request": "launch",
"program": "${workspaceFolder}/build/myTests",
},
- deleting urls to avoid being flagged as spam
update 1
So, I finally got another VM : Deep Learning AMI Amazon Linux 2 with Support by Supported Images By: Supported Images Latest Version: 20210714
I have rebuilt the shared library correctly.
$ nvidia-smi
Fri Jul 21 18:42:30 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03 Driver Version: 450.119.03 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 36C P8 28W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
and now again I am finding that cuda-gdb hangs:
NVIDIA (R) CUDA Debugger
10.0 release
Portions Copyright (C) 2007-2018 NVIDIA Corporation
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
[New LWP 2701]
[New LWP 2702]
[New LWP 2710]
[New LWP 2711]
[New LWP 2731]
[New LWP 2732]
[New LWP 2733]
attached to process 2693
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
0x00007fbd4e4f5f6c in select () from /lib64/libc.so.6
- now it looks like "Cuda-gdb hangs indefinitely" https://forums.developer.nvidia.com/t/cuda-gdb-hangs-indefinitely/217087
I will try to explain how I use de debugger. The main app is Python, using a shared library built with cmake and cuda. The tests are smaller scripts cuda-only. Anyway, running without the debugger it does not hang. While running my Python app I use:
- launch my python app and stop at some break-point. (wherever in the code I want it to be paused)
- cuda attach, then select the process.
- at pressing continue (vscode F5) it hangs.
Finally:
- Comments on checking this unsolved issue would be appreciated.
- I think I may discard the LZMA tags.
- Maybe I should use some printf() or write content to a text file in order to understand what threads are doing.
- I may show some cmake code if needed.