1

i am curious how I can set the steps_per_epoch in tf.keras fit for training on a tf.dataset?. Since I need the number of examples to calculate it I wonder how I get this?

As it is of type tf.data you could assume assume that this is more easier. If I set steps_per_epoch to None I get "unknown".

Why using tf.data makes life so complicated?

Timbus Calin
  • 13,809
  • 5
  • 41
  • 59
ctiid
  • 335
  • 1
  • 3
  • 14

2 Answers2

4

The previous answer is good, yet I would like to point out two matters:

  1. The code below works, no need to use the experimental package anymore.
import tensorflow as tf
dataset = tf.data.Dataset.range(42)
#Still prints 42
print(dataset.cardinality().numpy())
  1. If you use the filter predicate, the cardinality may return value -2, hence unknown; if you do use filter predicates on your dataset, ensure that you have calculated in another manner the length of your dataset( for example length of pandas dataframe before applying .from_tensor_slices() on it.

Another important point is how to set the parameters steps_per_epoch and validation_steps : steps_per_epoch == length_of_training_dataset // batch_size, validation_steps == length_of_validation_dataset // batch_size

A full example is available here : How to use repeat() function when building data in Keras?

Timbus Calin
  • 13,809
  • 5
  • 41
  • 59
  • Thank you! But I am really confused. I use repeat(2) after the dataset (actually I dont really know what this does). How do I set the steps_per_epoch now to get all data in each epoch? Do I now set the steps count as twice of the previus value because of repeat(2)?! – ctiid Sep 25 '20 at 07:59
  • Steps_per_epoch == length_of_training_dataset // batch_size, validation_steps == length_of_validation_dataset // batch_size – Timbus Calin Sep 25 '20 at 08:02
  • I get -2. my dataset is of type FlatMapDataset . – ctiid Sep 25 '20 at 08:41
  • Told you it could happen. Then you need to somehow calculate prior the length. In some way I understand why it returns -2, because when you flatten, you don't know how many elements you get. Ensure that you calculate the length prior to avoid any problem. – Timbus Calin Sep 25 '20 at 08:44
  • So as for my understanding. Working with plain tf.dataset without repeat just goes through the whole data once at maimum per epoch? So i need repeat() and set steps_per_epoch as all number of batches? – ctiid Sep 25 '20 at 10:49
  • Regardless of tf.dataset , when using .fit() steps_per_epoch = dataset_length // batch_size, and yes, this is the total number of batches given a specific batch_size. – Timbus Calin Sep 25 '20 at 10:50
  • And you need to specify .repeat() on the actual training_set and validation_set that you send to the .fit() method. – Timbus Calin Sep 25 '20 at 10:52
0

Try tf.data.experimental.cardinality:

dataset = tf.data.Dataset.range(42)
print(tf.data.experimental.cardinality(dataset).numpy())

42
Vaziri-Mahmoud
  • 152
  • 1
  • 10