I am reproducing the code of TensorFlow's Time series forecasting tutorial.
They use tf.data
to shuffle, batch, and cache the dataset. More precisely they do the following:
BATCH_SIZE = 256
BUFFER_SIZE = 10000
train_univariate = tf.data.Dataset.from_tensor_slices((x_train_uni, y_train_uni))
train_univariate = train_univariate.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
val_univariate = tf.data.Dataset.from_tensor_slices((x_val_uni, y_val_uni))
val_univariate = val_univariate.batch(BATCH_SIZE).repeat()
I can't understand why they use repeat()
and, even more so, why they don't specify the count
argument of repeat. What is the point of making the process repeat indefinitely? And how can the algorithm read all the elements in an infinitely big dataset?