tensorflow, how does it work in parallel

Question

I'm studying machine learning and tensorflow. but i have a question I saw documents that Tensorflow by default uses multiple threads. So I tried to check this out by using log.

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
 x = tf.constant(2)
 y2 = x - 66
 y1 = x + 300
 y = y1 + y2
 result = sess.run(y)
 print(result)

then I can get result below.

Const: (Const): /job:localhost/replica:0/task:0/cpu:0 I tensorflow/core/common_runtime/simple_placer.cc:827] Const: (Const)/job:localhost/replica:0/task:0/cpu:0 add: (Add): /job:localhost/replica:0/task:0/cpu:0 I tensorflow/core/common_runtime/simple_placer.cc:827] add: (Add)/job:localhost/replica:0/task:0/cpu:0 sub: (Sub): /job:localhost/replica:0/task:0/cpu:0 I tensorflow/core/common_runtime/simple_placer.cc:827] sub: (Sub)/job:localhost/replica:0/task:0/cpu:0 add_1: (Add): /job:localhost/replica:0/task:0/cpu:0 I tensorflow/core/common_runtime/simple_placer.cc:827] add_1: (Add)/job:localhost/replica:0/task:0/cpu:0 add/y: (Const): /job:localhost/replica:0/task:0/cpu:0 I tensorflow/core/common_runtime/simple_placer.cc:827] add/y: (Const)/job:localhost/replica:0/task:0/cpu:0 sub/y: (Const): /job:localhost/replica:0/task:0/cpu:0 I tensorflow/core/common_runtime/simple_placer.cc:827] sub/y: (Const)/job:localhost/replica:0/task:0/cpu:0 238

It seems like work in parallel and synchronous, is that right? Is it default?

Device placement is just going to tell you which device (CPU/GPU) the ops are run on. To figure out parallelism on the CPU, you could look at the [timeline](http://stackoverflow.com/questions/34293714/can-i-measure-the-execution-time-of-individual-operations-with-tensorflow#37774470). In general there is both intra- and inter-op parallelism. — Allen Lavoie, Feb 06 '17 at 20:15

score 0 · Answer 1 · answered Jan 02 '19 at 11:16

Check the implementation in direct_session.c and executor.cc, where ExecutorState is created for each partition of the Graph, and a threadpool is used to execute each node ready to be processed.

In below function, "ready" is a vector initialised with the input nodes of the graph, so each will run in a separate thread, and continue executing the nodes in that branch which has its dependencies satisfied. As the processing across threads continues, the dependencies for different nodes will be satisfied and will be executed.

void ExecutorState::RunAsync(Executor::DoneCallback done)
    // Schedule to run all the ready ops in thread pool.
    ScheduleReady(ready, nullptr);
}

ExecutorState::ScheduleReady is the function which triggers the parallelism and is called from code where nodes are ready to be processed, which includes ExecutorState::RunAsync and ExecutorState::NodeDone

tensorflow, how does it work in parallel

1 Answers1