Tensorflow - Could not synchronize CUDA stream: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

Question

I am trying to setup a deep learning machine with dual rtx 3070 gpus in Ubuntu 20.04. I have installed Nvidia drivers 460,CUDA 11.2 and Cudnn 8.1. When i try to test the gpu with a sample tensorflow code i am getting CUDA_ERROR_ILLEGAL_ADDRESS on both GPUs. Can someone let me know what the issue is?

Hitting this issue in Python3.8 and 3.9 and also in tensorflow 2.5.0,2.9.0

Mon Jun 27 14:10:22 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3070    Off  | 00000000:09:00.0 Off |                  N/A |
| 57%   46C    P8    28W / 270W |     15MiB /  7979MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3070    Off  | 00000000:0A:00.0 Off |                  N/A |
|  0%   48C    P8    23W / 270W |      5MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1264      G   /usr/lib/xorg/Xorg                  9MiB |
|    0   N/A  N/A      1463      G   /usr/bin/gnome-shell                3MiB |
|    1   N/A  N/A      1264      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

2022-06-27 13:59:11.843491: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-27 13:59:12.901104: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-06-27 13:59:12.943243: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.943685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2022-06-27 13:59:12.943725: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.944127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2022-06-27 13:59:12.944141: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-27 13:59:12.945421: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-06-27 13:59:12.945447: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-06-27 13:59:12.945900: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2022-06-27 13:59:12.946021: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2022-06-27 13:59:12.946360: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2022-06-27 13:59:12.946647: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2022-06-27 13:59:12.946717: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2022-06-27 13:59:12.946758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.947192: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.947610: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.948028: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:12.948421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2022-06-27 13:59:12.948662: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-27 13:59:13.250592: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.250986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2022-06-27 13:59:13.251025: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.251384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2022-06-27 13:59:13.251420: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.251794: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.252260: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.252633: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.252984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
2022-06-27 13:59:13.253013: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-06-27 13:59:13.721007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-06-27 13:59:13.721033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 1
2022-06-27 13:59:13.721041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N N
2022-06-27 13:59:13.721045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 1:   N N
2022-06-27 13:59:13.721180: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.721614: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.722002: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.722382: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.722757: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.723125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6114 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3070, pci bus id: 0000:09:00.0, compute capability: 8.6)
2022-06-27 13:59:13.723332: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-27 13:59:13.723699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 6126 MB memory) -> physical GPU (device: 1, name: GeForce RTX 3070, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2022-06-27 13:59:14.182153: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-06-27 13:59:14.201622: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3700310000 Hz
Epoch 1/10
2022-06-27 13:59:14.370438: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-06-27 13:59:14.763996: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-06-27 13:59:14.764035: I tensorflow/stream_executor/cuda/cuda_blas.cc:1838] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
1563/1563 [==============================] - 3s 1ms/step - loss: 1.8131 - accuracy: 0.3549
Epoch 2/10
 500/1563 [========>.....................] - ETA: 1s - loss: 1.6540 - accuracy: 0.4167Traceback (most recent call last):
  File "/home/vicky/testtf/testf.py", line 24, in <module>
    model_gpu.fit(X_train_scaled, y_train_encoded, epochs = 10)
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/engine/training.py", line 1188, in fit
    callbacks.on_train_batch_end(end_step, logs)
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 457, in on_train_batch_end
    self._call_batch_hook(ModeKeys.TRAIN, 'end', batch, logs=logs)
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 317, in _call_batch_hook
    self._call_batch_end_hook(mode, batch, logs)
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 337, in _call_batch_end_hook
    self._call_batch_hook_helper(hook_name, batch, logs)
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 375, in _call_batch_hook_helper
    hook(batch, logs)
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 1029, in on_train_batch_end
    self._batch_update_progbar(batch, logs)
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/callbacks.py", line 1101, in _batch_update_progbar
    logs = tf_utils.sync_to_numpy_or_python_type(logs)
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/utils/tf_utils.py", line 519, in sync_to_numpy_or_python_type
    return nest.map_structure(_to_single_numpy_or_python_type, tensors)
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/util/nest.py", line 867, in map_structure
    structure[0], [func(*x) for x in entries],
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/util/nest.py", line 867, in <listcomp>
    structure[0], [func(*x) for x in entries],
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/keras/utils/tf_utils.py", line 515, in _to_single_numpy_or_python_type
    x = t.numpy()
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 1094, in numpy
    maybe_arr = self._numpy()  # pylint: disable=protected-access
  File "/home/vicky/testtf/tf/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 1062, in _numpy
    six.raise_from(core._status_to_exception(e.code, e.message), None)  # pylint: disable=protected-access
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Could not synchronize CUDA stream: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

This is the sample code i am running

import tensorflow as tf
from tensorflow import keras
import numpy as np
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
# scaling image values between 0-1
X_train_scaled = X_train/255
X_test_scaled = X_test/255
# one hot encoding labels
y_train_encoded = keras.utils.to_categorical(y_train, num_classes = 10, dtype = 'float32')
y_test_encoded = keras.utils.to_categorical(y_test, num_classes = 10, dtype = 'float32')
def get_model():
    model = keras.Sequential([
        keras.layers.Flatten(input_shape=(32,32,3)),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(1000, activation='relu'),
        keras.layers.Dense(10, activation='sigmoid')
    ])
    model.compile(optimizer='SGD',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
    return model
with tf.device('/GPU:0'):
    model_gpu = get_model()
    model_gpu.fit(X_train_scaled, y_train_encoded, epochs = 10)

Have you tried using MirroredStrategy? And also [dynamic memory growth?](https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory) (Not sure if it'd help) — Djinn, Jun 27 '22 at 06:37
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Jun 27 '22 at 09:52

Tensorflow - Could not synchronize CUDA stream: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

0 Answers0