0

I need to perform a job that averages large numbers of long vectors multiple times, and I would like this to be done on my GPU.

Monitoring nvtop and htop while running, I see that GPU (which always shows top activity when I train Keras models) is not being used at all in these operations, while CPU-use surges during these operations.

I have simulated it in the code snippet below (trying to minimize non-tf-work).

what am I doing wrong?

import tensorflow as tf
from tensorflow.math import add_n, add, scalar_mul
import numpy as np

tf.debugging.set_log_device_placement(True)
sess = tf.compat.v1.Session(config=config) 
tf.compat.v1.keras.backend.set_session(sess)
os.environ["CUDA_VISIBLE_DEVICES"]="1"

#Make a random numpy matrix
vecs=np.random.rand(100, 300)

with sess.as_default():
    with tf.device('/GPU:0'):
        for _ in range(1000):
            #vecs=np.random.rand(100, 300)
            tf_vecs=tf.Variable(vecs, dtype=tf.float64)
            tf_invlgt=tf.Variable(1/np.shape(vecs)[0],dtype=tf.float64)
            vectors=tf.unstack(tf_vecs)
            sum_vecs=add_n(vectors)
            mean_vec=tf.Variable(scalar_mul(tf_invlgt, sum_vecs))

Thanks

Michael

1 Answers1

0

I might be wrong but could it be that the cuda_visible_devices should be "0" like

import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"

see github comment here

If if still does not work, you can also add a small piece of code to check if tensorflow can see the gpu devices:

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

This is mentioned here

Dieter Maes
  • 151
  • 6