5

I am reproducing the code of TensorFlow's Time series forecasting tutorial.

They use tf.data to shuffle, batch, and cache the dataset. More precisely they do the following:

BATCH_SIZE = 256
BUFFER_SIZE = 10000

train_univariate = tf.data.Dataset.from_tensor_slices((x_train_uni, y_train_uni))
train_univariate = train_univariate.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()

val_univariate = tf.data.Dataset.from_tensor_slices((x_val_uni, y_val_uni))
val_univariate = val_univariate.batch(BATCH_SIZE).repeat()

I can't understand why they use repeat() and, even more so, why they don't specify the count argument of repeat. What is the point of making the process repeat indefinitely? And how can the algorithm read all the elements in an infinitely big dataset?

NC520
  • 346
  • 3
  • 13
  • 1
    I think this [post](https://stackoverflow.com/questions/57711103/difference-between-tf-data-dataset-repeat-vs-iterator-initializer) should help – CPak Jul 11 '20 at 15:49
  • 1
    Does this answer your question? [Difference between tf.data.Dataset.repeat() vs iterator.initializer](https://stackoverflow.com/questions/57711103/difference-between-tf-data-dataset-repeat-vs-iterator-initializer) – CPak Jul 11 '20 at 15:49
  • 1
    This might help you. https://stackoverflow.com/questions/53514495/what-does-batch-repeat-and-shuffle-do-with-tensorflow-dataset – hassan zahin Sep 28 '20 at 15:56

1 Answers1

4

As can be seen in the tutorials of tensorflow federated for image classification the repeat method is used to use repetitions of the dataset that will also indicate the number of epochs for the training.

So use .repeat(NUM_EPOCHS) where NUM_EPOCHS is the number of epochs for the training.