0

I am trying to run multiple instances of an op which depend on a shared kernel (a tensorflow variable) K, in parallel.

From the tensorflow FAQ:

The Session API allows multiple concurrent steps (i.e. calls to tf.Session.run in parallel. This enables the runtime to get higher throughput, if a single step does not use all of the resources in your computer.

My code looks similar to this:

def some_op(K):
    # Do some processing on shared K
    return some_value

K = tf.random_uniform([kernel_size, kernel_size], 0, 1, dtype=tf.float32)

op_ = some_op(K)    

op_list = []
for i in range(n_experiments):
    op_list.append(op_)

with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    vals = sess.run(op_list)
    print vals

I am getting no speedup at all. The runtimes are as:

  • n_experiments, runtime(s)
  • 1 , 2.35
  • 5 , 10.32
  • 10 , 24.58
talonmies
  • 70,661
  • 34
  • 192
  • 269
stochastic_zeitgeist
  • 1,037
  • 1
  • 14
  • 21

1 Answers1

2

It looks like your op_list contains n_experiments copies of the same tf.Tensor. If that is the case, calling sess.run(op_list) will execute the op once, then make n_experiments copies of the result. If you want to invoke multiple instances of some_op(K) in parallel, you should rewrite the code as follows:

op_list = []
for i in range(n_experiments):
  op_list.append(some_op(K))

Note that increasing n_experiments is likely to increase the execution time, because you're issuing more work to the same set of resources. If you want to increase the potential parallelism, you can use tf.ConfigProto when creating your session, as suggested in this answer.

Community
  • 1
  • 1
mrry
  • 125,488
  • 26
  • 399
  • 400