Invoking GPU asm compilation is supported on Cuda non-Windows platforms only. Relying on driver to perform ptx compilation

Question

I am trying to use my GPU with TensorFlow 2.3.0 on a simple MNIST model. I have installed CUDA 10.1 and cuDNN 7.6.5. It seems to work(the model is faster than before, 2 seconds an epoch), though opening up task manager makes it seem like the GPU is not being used at all, indicating it could be much faster. I have seen other questions about this warning on SO, though the answers all pointed to the use of data generators, which I am not using. I tried the solution mentioned in the comments here:Tensorflow-gpu not using GPU while fitting model though it did not help. My jupyter notebook output is as follows:

[I 17:06:09.421 NotebookApp] JupyterLab extension loaded from C:\Users\jsmith\Anaconda3\lib\site-packages\jupyterlab
[I 17:06:09.421 NotebookApp] JupyterLab application directory is C:\Users\jsmith\Anaconda3\share\jupyter\lab
[I 17:06:09.423 NotebookApp] Serving notebooks from local directory: C:\Users\jsmith
[I 17:06:09.424 NotebookApp] The Jupyter Notebook is running at:
[I 17:06:09.424 NotebookApp] http://localhost:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
[I 17:06:09.424 NotebookApp]  or http://127.0.0.1:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
[I 17:06:09.424 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:06:09.460 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///C:/Users/jsmith/AppData/Roaming/jupyter/runtime/nbserver-13024-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
     or http://127.0.0.1:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
[I 17:06:17.565 NotebookApp] Kernel started: 02385ecf-f682-496e-a056-9442356a7642
[I 17:06:30.004 NotebookApp] Starting buffering for 02385ecf-f682-496e-a056-9442356a7642:10507d95e443431392e5aa3a711b2952
[I 17:06:30.237 NotebookApp] Kernel restarted: 02385ecf-f682-496e-a056-9442356a7642
[I 17:06:30.824 NotebookApp] Restoring connection for 02385ecf-f682-496e-a056-9442356a7642:10507d95e443431392e5aa3a711b2952
[I 17:06:30.824 NotebookApp] Replaying 3 buffered messages
2021-01-11 17:06:31.365107: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:33.450527: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-01-11 17:06:33.480765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-01-11 17:06:33.480922: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:33.485959: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:33.489200: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-11 17:06:33.490830: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-11 17:06:33.494440: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-11 17:06:33.496906: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-11 17:06:33.510665: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:33.510964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-11 17:06:33.511780: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-11 17:06:33.520884: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ac3b7d9710 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-11 17:06:33.520974: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-11 17:06:33.521624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-01-11 17:06:33.522001: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:33.522330: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:33.522841: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-11 17:06:33.523498: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-11 17:06:33.523780: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-11 17:06:33.525973: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-11 17:06:33.526177: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:33.527458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-11 17:06:34.029958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-11 17:06:34.030108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-01-11 17:06:34.031482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-01-11 17:06:34.032610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2905 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-01-11 17:06:34.037750: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ac659c5460 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-11 17:06:34.037828: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1650 Ti, Compute Capability 7.5
2021-01-11 17:06:34.213785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-01-11 17:06:34.214061: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:34.215834: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:34.219921: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-11 17:06:34.220345: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-11 17:06:34.220737: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-11 17:06:34.221081: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-11 17:06:34.221561: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:34.221816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-11 17:06:34.222153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-11 17:06:34.222309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-01-11 17:06:34.222660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-01-11 17:06:34.222923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2905 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-01-11 17:06:35.001255: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:35.220686: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:36.204198: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.

This is my code for getting the data:

num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

And this is my model:

    from tensorflow.keras import layers        
    model = keras.Sequential(
                [
                    keras.Input(shape=input_shape),
                    layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
                    layers.MaxPooling2D(pool_size=(2, 2)),
                    layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
                    layers.MaxPooling2D(pool_size=(2, 2)),
                    layers.Flatten(),
                    layers.Dropout(0.5),
                    layers.Dense(num_classes, activation="softmax"),
                ]
            )
        batch_size = 128
        epochs = 60
        
        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
        
        batch_size = 128
epochs = 60

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
with tf.device('/GPU:1'):
    model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

This is Task Manager. You can see how almost all the GPU memory is being used, yet 4% is being utilized, while 45% of the CPU is being used.

We shouldn't use `Task Manager` to check if `GPU` is being used by `Tensorflow`. For more details you can refer [this](https://stackoverflow.com/a/62772750/14290681). Thanks! — , Jan 14 '21 at 11:12
@TFer2 Thank you, it turns out that `tf.test.is_built_with_cuda()` returns `True`, meaning it is running on the GPU. Do you happen to know what caused the warning(`Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation.`), though? — John Smith, Jan 14 '21 at 13:51

score 4 · Accepted Answer · answered Mar 12 '21 at 11:12

4

On that page select the drop down next to Video Encode and change it to CUDA. You will then see your GPU activity for Tensorflow. It was not obvious to me either but basically you are just looking at the wrong part of GPU activity.

answered Mar 12 '21 at 11:12

William Heymann

76
6

Invoking GPU asm compilation is supported on Cuda non-Windows platforms only. Relying on driver to perform ptx compilation

1 Answers1