0
  1. I'm curious of model parallelism, and I've read the code from Yaroslav Bulatov. And in that example, we should partition our model (or in tensorflow we called Graph) manually to different partition (left_network & right_network). So, I was wondering if I have to make partitions manually, what's the simple_placer.cc and graph_partition.cc have done to the whole graph? And I'm still not clear as all.

  2. In my thought(let me know if anythong wrong): If the graph has 8 partitions(subgraph) which can be seen as 8 jobs, and 4 workers,How the partitions distributed to workers can be done through:

  • explicit annotations via tf.device(), or
  • distributed training, tf.train.replica_device_setter()

share the variables across parameter servers, and otherwise put all ops on the worker device

But how's the graph make partitions? I want to trace what's the subgraph (op-nodes set) looks like? Can I dump the result or I need to trace/modified which code file?

Please let me know if any concepts is wrong or vague. I'm a rookie of these, any opinion is appreciated.

  1. In the code below, is matmul a op-node, would it be partition into different jobs?

     y_ = tf.placeholder(tf.float32, [None, 10])
     x = tf.placeholder(tf.float32, [None, 784])
     W = tf.Variable(tf.zeros([784, 10]))
     b = tf.Variable(tf.zeros([10]))
     y = tf.matmul(x, W)  + b
    
Glorfindel
  • 21,988
  • 13
  • 81
  • 109
redfishleo
  • 21
  • 3

1 Answers1

1

You can get the result of the placement algorithm by passing additional options when you call tf.Session.run()

# ...
y = tf.matmul(x, W) + b

sess = tf.Session()
options = tf.RunOptions(output_partition_graphs=True)
metadata = tf.RunMetadata()

sess.run(y, options=options, run_metadata=metadata)

# `metadata` now contains information about what happened during the `run()` call.
for partition in metadata.partition_graphs:

  # `partition` is a `tf.GraphDef` representing all the nodes that ran on a single
  # device. All nodes in `partition` have the same `device` value.
  device = partition.node[0].device

  for node in partition.node:
    # e.g. print each node or store it in a dictionary for further analysis.
    # ...
mrry
  • 125,488
  • 26
  • 399
  • 400