4

Many TensorFlow operations have shared_name as an optional argument. For example make_initializable_iterator (related question), most (all?) TF resources (variables, TensorArray, ...), ConditionalAccumulator, _MutableDenseHashTable, FIFOQueue (related issue), etc.

In the documentation, it often says sth like this:

shared_name: If non-empty, this table will be shared under the given name across multiple sessions.

But how does that work? How do I actually share that resource / tensor / op (or what actually exactly?) across multiple sessions?

Would that be multiple sessions in the same process? Or multiple sessions across multiple processes/machines (remotely)?

Would it share the same memory (only possible if within the same process, or at least same host, by using shared memory)? Or how else would it synchronize the state?

And is Graph.container related to that? From that doc:

Stateful operations, such as variables and queues, can maintain their states on devices so that they can be shared by multiple processes.

How does the sharing across multiple processes work?

And is distributed TensorFlow (tf.distribute) related to that? How?

Or remote_call? (See also this question.)

Albert
  • 65,406
  • 61
  • 242
  • 386
  • 1
    Fwiw, relevant C++ code: [`resource_handle.h`](https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/core/framework/resource_handle.h) / [`resource_handle.cc`](https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/core/framework/resource_handle.cc) and [`resource_mgr.h`](https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/core/framework/resource_mgr.h) / [`resource_mgr.cc`](https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/core/framework/resource_mgr.cc). It does seem to be what `Graph.container` refers to, but there's no clear docs about it. – jdehesa Jun 01 '20 at 10:02
  • 1
    I think a [`ResourceHandlesOp`](https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/core/framework/resource_mgr.h#L487-L503) is created in the graph for each type or resource (registered with `REGISTER_RESOURCE_HANDLE_KERNEL`), and they hold the containers for their type of resource, that can be shared across sessions. Resources without `shared_name` get an [`ANONYMOUS_NAME`](https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/core/framework/resource_handle.h#L88-L91) that is replaced by a unique id later. – jdehesa Jun 01 '20 at 10:15
  • 1
    The mechanism appears to have been designed first for variables (see [here](https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/core/kernels/resource_variable_ops.h) and [here](https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/core/kernels/resource_variable_ops.cc)) and then extended to other resources. Now there are [two variable handling ops](https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/core/ops/resource_variable_ops.cc#L78-L131), `VarHandleOp` and `_VarHandlesOp`, the first using `VarHandleOp` kernel and the other `ResourceHandlesOp` ‍♂️ – jdehesa Jun 01 '20 at 10:17

0 Answers0