I am trying to setup a deep learning machine with dual rtx 3070 gpus in Ubuntu 20.04. I have installed Nvidia drivers 460,CUDA 11.2 and Cudnn 8.1. When i try to test the gpu with a sample tensorflow code i am getting CUDA_ERROR_ILLEGAL_ADDRESS on both GPUs. Can someone let me know what the issue is?
Hitting this issue in Python3.8 and 3.9 and also in tensorflow 2.5.0,2.9.0
Mon Jun 27 14:10:22 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3070 Off | 00000000:09:00.0 Off | N/A |
| 57% 46C P8 28W / 270W | 15MiB / 7979MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 3070 Off | 00000000:0A:00.0 Off | N/A |
| 0% 48C P8 23W / 270W | 5MiB / 7982MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1264 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 1463 G /usr/bin/gnome-shell 3MiB |
| 1 N/A N/A 1264 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
2022-06-27 13:59:11.843491: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-27 13:59:12.901104: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-06-27 13:59:12.943243: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.943685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2022-06-27 13:59:12.943725: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.944127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2022-06-27 13:59:12.944141: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-27 13:59:12.945421: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-06-27 13:59:12.945447: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-06-27 13:59:12.945900: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2022-06-27 13:59:12.946021: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2022-06-27 13:59:12.946360: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2022-06-27 13:59:12.946647: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2022-06-27 13:59:12.946717: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2022-06-27 13:59:12.946758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.947192: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.947610: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.948028: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.948421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2022-06-27 13:59:12.948662: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-27 13:59:13.250592: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.250986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2022-06-27 13:59:13.251025: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.251384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2022-06-27 13:59:13.251420: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.251794: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.252260: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.252633: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.252984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2022-06-27 13:59:13.253013: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-27 13:59:13.721007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-06-27 13:59:13.721033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 1
2022-06-27 13:59:13.721041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N N
2022-06-27 13:59:13.721045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 1: N N
2022-06-27 13:59:13.721180: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.721614: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.722002: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.722382: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.722757: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.723125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6114 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3070, pci bus id: 0000:09:00.0, compute capability: 8.6)
2022-06-27 13:59:13.723332: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.723699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 6126 MB memory) -> physical GPU (device: 1, name: GeForce RTX 3070, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2022-06-27 13:59:14.182153: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-06-27 13:59:14.201622: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3700310000 Hz
Epoch 1/10
2022-06-27 13:59:14.370438: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-06-27 13:59:14.763996: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-06-27 13:59:14.764035: I tensorflow/stream_executor/cuda/cuda_blas.cc:1838] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
1563/1563 [==============================] - 3s 1ms/step - loss: 1.8131 - accuracy: 0.3549
Epoch 2/10
500/1563 [========>.....................] - ETA: 1s - loss: 1.6540 - accuracy: 0.4167Traceback (most recent call last):
File "/home/vicky/testtf/testf.py", line 24, in <module>
model_gpu.fit(X_train_scaled, y_train_encoded, epochs = 10)
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/engine/training.py", line 1188, in fit
callbacks.on_train_batch_end(end_step, logs)
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 457, in on_train_batch_end
self._call_batch_hook(ModeKeys.TRAIN, 'end', batch, logs=logs)
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 317, in _call_batch_hook
self._call_batch_end_hook(mode, batch, logs)
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 337, in _call_batch_end_hook
self._call_batch_hook_helper(hook_name, batch, logs)
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 375, in _call_batch_hook_helper
hook(batch, logs)
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 1029, in on_train_batch_end
self._batch_update_progbar(batch, logs)
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 1101, in _batch_update_progbar
logs = tf_utils.sync_to_numpy_or_python_type(logs)
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/utils/tf_utils.py", line 519, in sync_to_numpy_or_python_type
return nest.map_structure(_to_single_numpy_or_python_type, tensors)
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/util/nest.py", line 867, in map_structure
structure[0], [func(*x) for x in entries],
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/util/nest.py", line 867, in <listcomp>
structure[0], [func(*x) for x in entries],
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/utils/tf_utils.py", line 515, in _to_single_numpy_or_python_type
x = t.numpy()
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 1094, in numpy
maybe_arr = self._numpy() # pylint: disable=protected-access
File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 1062, in _numpy
six.raise_from(core._status_to_exception(e.code, e.message), None) # pylint: disable=protected-access
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Could not synchronize CUDA stream: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
This is the sample code i am running
import tensorflow as tf
from tensorflow import keras
import numpy as np
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
# scaling image values between 0-1
X_train_scaled = X_train/255
X_test_scaled = X_test/255
# one hot encoding labels
y_train_encoded = keras.utils.to_categorical(y_train, num_classes = 10, dtype = 'float32')
y_test_encoded = keras.utils.to_categorical(y_test, num_classes = 10, dtype = 'float32')
def get_model():
model = keras.Sequential([
keras.layers.Flatten(input_shape=(32,32,3)),
keras.layers.Dense(3000, activation='relu'),
keras.layers.Dense(1000, activation='relu'),
keras.layers.Dense(10, activation='sigmoid')
])
model.compile(optimizer='SGD',
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
with tf.device('/GPU:0'):
model_gpu = get_model()
model_gpu.fit(X_train_scaled, y_train_encoded, epochs = 10)