1

In distributed tensorflow, I need processing input datas on one worker and consuming them on other different session. "make_initializable_iterator" have an undocumented parameter "shared_name", but how could I initialize the iterator without create the datasets on the every session.

def make_initializable_iterator(self, shared_name=None):
    """Creates an `Iterator` for enumerating the elements of this dataset.
    Note: The returned iterator will be in an uninitialized state,
    and you must run the `iterator.initializer` operation before using it"""

More clear, if I defined an iterator with shared_name, how to use this iterator in another session.

PasserbyD
  • 65
  • 1
  • 8
  • After some experiments, I found the iterator should only be initialized on the session which creates it. So the real question should be how to check an iterator is initialized in distributed sessions. – PasserbyD Apr 26 '19 at 07:51

1 Answers1

0

The iter_init_op might be what you are searching for:

# this's how a input pipeline usually looks like
ncores = multiprocessing.cpu_count()
dataset = tf.data.Dataset.from_tensor_slices(file_list))
dataset = dataset.map(augmentation_function, num_parallel_calls=ncores)
batch = dataset.shuffle(batch_size).batch(batch_size).prefetch(5)

# construct iterator
it = batch.make_initializable_iterator(shared_name='shared_iterator')
iter_init_op = it.initializer # you call this operation within session to initialiser

Within the session:

with tf.Session() as sess:
     ...
     for epoch in range(nb_epoch):
          # init iterator during epoch
          sess.run(iter_init_op)
Zézouille
  • 503
  • 6
  • 21
  • it's not about member initializer, but how could i get "batch" dataset on other sessions. – PasserbyD Apr 15 '19 at 07:27
  • I might misunderstand your question because it's not clear. One, by doing this, can process input data on CPU and prefetch (get ready for GPU). This iterator.initializer is called outside the training. The prefetch allows somehow you make a queue of data ready for training to maximize the occupancy of gpu. Check [this](https://www.tensorflow.org/guide/performance/datasets). – Zézouille Apr 15 '19 at 08:10
  • @PasserbyD Can you show more codes to illustrate what you want please? – Zézouille Apr 24 '19 at 21:44
  • I‘ve already figured it out. Define a iterator with shared_name by "make_initializable_iterator" in one session, and use this iterator in another session. The "user" session does not need to initialize the iterator, but there is no proper way to check whether it is be initialized. – PasserbyD Apr 26 '19 at 07:45
  • @PasserbyD tell if I'm wrong, your "user" session is a session that restore graph and weights? I think that you might need to init once at least at one of your session otherwise you might get ''FailedPreconditionError (see above for traceback): GetNext() failed because the iterator has not been initialized." – Zézouille Apr 26 '19 at 09:12
  • The situation is in distributed setup. you can create the iterator in one worker's session, and use it in another "user" worker. Thus you can have a global processor among all workers on different hosts. – PasserbyD Apr 28 '19 at 03:37