I am trying to run multiple instances of an op which depend on a shared kernel (a tensorflow variable) K
, in parallel.
From the tensorflow FAQ:
The Session API allows multiple concurrent steps (i.e. calls to tf.Session.run in parallel. This enables the runtime to get higher throughput, if a single step does not use all of the resources in your computer.
My code looks similar to this:
def some_op(K):
# Do some processing on shared K
return some_value
K = tf.random_uniform([kernel_size, kernel_size], 0, 1, dtype=tf.float32)
op_ = some_op(K)
op_list = []
for i in range(n_experiments):
op_list.append(op_)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
vals = sess.run(op_list)
print vals
I am getting no speedup at all. The runtimes are as:
n_experiments
, runtime(s)- 1 , 2.35
- 5 , 10.32
- 10 , 24.58