Tensorflow-gpu not using GPU while fitting model

Question

Sorry I know this question has been asked many times, but it seems like none of the answers I found really solved my problem, I already checked several times that my versions are compatible with given version on official website. version info:
System: Windows 10
IDE: PyCharm Professional
Tensorflow version: 2.3.0
CUDA version: 10.1
CUDNN version: cudnn-10.1-windows10-x64-v7.6.5.32_3
Python version: 3.7
environment variable added to PATH

I think tf can already recognize and run with my gpu, since I tried running:

import tensorflow as tf
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))

which shows

In addition, running print(tf.config.experimental.list_physical_devices('GPU')) outputs [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

but when I actually try to run a simple CNN:

import os
# os.environ['TF_CPP_MIN_LOG_LEVEL'] = "1"
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from tensorflow import keras


class MyCallback(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if logs.get("loss") < .01:
            print("loss below .01, ending training")
            self.model.stop_training = True


if __name__ == '__main__':
    # print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
    callbacks = MyCallback()

    mnist = keras.datasets.fashion_mnist
    (trainX, trainY), (testX, testY) = mnist.load_data()
    # plt.imshow(trainX[0], cmap="gray")
    # plt.show()
    trainX = trainX / 255.0
    testX = testX / 255.0
    trainX = trainX.reshape(60000, 28, 28, 1)
    testX = testX.reshape(10000, 28, 28, 1)
    model = keras.models.Sequential([keras.layers.Conv2D(64, (3, 3), activation=tf.nn.leaky_relu,
                                                         input_shape=(28, 28, 1), padding="same"),
                                     keras.layers.MaxPooling2D(2, 2),
                                     keras.layers.Conv2D(64, (3, 3), activation=tf.nn.leaky_relu),
                                     keras.layers.MaxPooling2D(3, 3),
                                     keras.layers.Conv2D(64, (3, 3), activation=tf.nn.leaky_relu),
                                     keras.layers.Flatten(),
                                     keras.layers.Dense(256, activation=tf.nn.leaky_relu),
                                     # keras.layers.Dense(128, activation=tf.nn.leaky_relu),
                                     keras.layers.Dense(10, activation=tf.nn.softmax)])
    model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.0012),
                  loss=tf.losses.sparse_categorical_crossentropy)
    model.fit(trainX, trainY, epochs=4000, verbose=2, callbacks=[callbacks])
    model.evaluate(testX, testY)

it shows:

2020-12-25 01:05:12.889545: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once.

the first few lines of outputs:

another reason why I believe it is not using gpu is that from task manager, it show gpu is only used <=3%, but cpu is >=45% Can somebody help? Thanks!

full warning:

2020-12-25 01:12:13.322490: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2020-12-25 01:12:16.212006: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll 2020-12-25 01:12:16.253006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5 coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s 2020-12-25 01:12:16.254288: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2020-12-25 01:12:16.259076: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2020-12-25 01:12:16.262699: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll 2020-12-25 01:12:16.264358: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll 2020-12-25 01:12:16.268891: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll 2020-12-25 01:12:16.272625: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll 2020-12-25 01:12:16.282877: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2020-12-25 01:12:16.283119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2020-12-25 01:12:16.283488: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-12-25 01:12:16.291040: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x223ba066800 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-12-25 01:12:16.291333: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-12-25 01:12:16.293402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5 coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s 2020-12-25 01:12:16.294634: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2020-12-25 01:12:16.294798: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2020-12-25 01:12:16.294981: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll 2020-12-25 01:12:16.295133: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll 2020-12-25 01:12:16.295285: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll 2020-12-25 01:12:16.295438: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll 2020-12-25 01:12:16.295620: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2020-12-25 01:12:16.296033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2020-12-25 01:12:16.939243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-12-25 01:12:16.939417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 2020-12-25 01:12:16.939519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N 2020-12-25 01:12:16.939768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4615 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5) 2020-12-25 01:12:16.942524: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x223fb58b970 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2020-12-25 01:12:16.942780: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1660 Ti, Compute Capability 7.5 Epoch 1/4000 2020-12-25 01:12:17.682064: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2020-12-25 01:12:17.988403: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2020-12-25 01:12:19.257223: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once.

Everything here points that the GPU is being used, you cannot use GPU utilization to infer if the GPU is being used or not. I do not see a problem here. — Dr. Snoopy, Dec 24 '20 at 17:22
thanks for answering. So does that mean I can just ignore the warning "Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once." — seermer, Dec 24 '20 at 17:27
if you are in doubt, you could try to force device placement using with tf.device('/GPU:0'): # in case your device is not available, it will fail. It is definitely suspicious that your CPU utilization is high and GPU low — CrazyBrazilian, Dec 25 '20 at 04:44
thanks! I think using with tf.device('/GPU:0'): works! now gpu usage is getting higher — seermer, Dec 25 '20 at 05:21

Tensorflow-gpu not using GPU while fitting model

0 Answers0

Linked