I'm curious of model parallelism, and I've read the code from Yaroslav Bulatov. And in that example, we should partition our model (or in tensorflow we called Graph) manually to different partition (left_network & right_network). So, I was wondering if I have to make partitions manually, what's the
simple_placer.cc
andgraph_partition.cc
have done to the whole graph? And I'm still not clear as all.In my thought(let me know if anythong wrong): If the graph has 8 partitions(subgraph) which can be seen as 8 jobs, and 4 workers,How the partitions distributed to workers can be done through:
- explicit annotations via
tf.device()
, or - distributed training,
tf.train.replica_device_setter()
share the variables across parameter servers, and otherwise put all ops on the worker device
But how's the graph make partitions? I want to trace what's the subgraph (op-nodes set) looks like? Can I dump the result or I need to trace/modified which code file?
Please let me know if any concepts is wrong or vague. I'm a rookie of these, any opinion is appreciated.
In the code below, is
matmul
a op-node, would it be partition into different jobs?y_ = tf.placeholder(tf.float32, [None, 10]) x = tf.placeholder(tf.float32, [None, 784]) W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) y = tf.matmul(x, W) + b