My Question is my training data is stored as one tf.record
file of 333G and one epoch takes 3 hour to finish training. So what is the best way to split my data in order to improve the speed or performance on input pipeline:
- Split the original dataset ( which in in
CSV
file) into 10 splits then create 10 tfrecord files. - Split the created one tfrecord file into multiple files through
tf.Dasata.shard
.If this option is better, how I should deal with shared dataset within Keras. Should I create 10 iterate? ( one iterator per each shard)?. I mean , I will not be able to save like 10 tfrecords file like option one, I will have one tfrecord file, but I just can get one shard at a time.