I need to perform a job that averages large numbers of long vectors multiple times, and I would like this to be done on my GPU.
Monitoring nvtop and htop while running, I see that GPU (which always shows top activity when I train Keras models) is not being used at all in these operations, while CPU-use surges during these operations.
I have simulated it in the code snippet below (trying to minimize non-tf-work).
what am I doing wrong?
import tensorflow as tf
from tensorflow.math import add_n, add, scalar_mul
import numpy as np
tf.debugging.set_log_device_placement(True)
sess = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(sess)
os.environ["CUDA_VISIBLE_DEVICES"]="1"
#Make a random numpy matrix
vecs=np.random.rand(100, 300)
with sess.as_default():
with tf.device('/GPU:0'):
for _ in range(1000):
#vecs=np.random.rand(100, 300)
tf_vecs=tf.Variable(vecs, dtype=tf.float64)
tf_invlgt=tf.Variable(1/np.shape(vecs)[0],dtype=tf.float64)
vectors=tf.unstack(tf_vecs)
sum_vecs=add_n(vectors)
mean_vec=tf.Variable(scalar_mul(tf_invlgt, sum_vecs))
Thanks
Michael