The Tensorflow Queues offered the advantage that data could be fetched and queued independently of the rest of the graph, allowing CPU/disk to pre-fetch data so that the GPUs don't run dry.
I've read in a blog that with the Dataset API, this is missing again. However the dataset shuffle()
function allows a buffer_size
which I would assume enables a buffer-queue? Is this the same as combining a Dataset API and Queue (see code below)? Is there a recommended way to create a proper, indipendent data-fetching queue?
Code example for Dataset API + Queue:
sample_set = tf.data.Dataset.from_generator(...)
sample = sample_set.make_one_shot_iterator().get_next()
sample_batch = tf.train.shuffle_batch([sample], batch_size=10,
capacity=30, num_threads=1,
min_after_dequeue=1)
... is this the same as in pure Dataset API? (How can I define the number of threads here?)
sample_set = tf.data.Dataset.from_generator(...)
sample_set = sample_set.shuffle(buffer_size=30)
sample_set = sample_set.batch(10)
sample = sample_set.make_one_shot_iterator().get_next()