2

My Question is my training data is stored as one tf.record file of 333G and one epoch takes 3 hour to finish training. So what is the best way to split my data in order to improve the speed or performance on input pipeline:

  1. Split the original dataset ( which in in CSV file) into 10 splits then create 10 tfrecord files.
  2. Split the created one tfrecord file into multiple files through tf.Dasata.shard.If this option is better, how I should deal with shared dataset within Keras. Should I create 10 iterate? ( one iterator per each shard)?. I mean , I will not be able to save like 10 tfrecords file like option one, I will have one tfrecord file, but I just can get one shard at a time.
W. Sam
  • 818
  • 1
  • 7
  • 21
  • This question seems related: https://stackoverflow.com/questions/54519309/split-tfrecords-file-into-many-tfrecords-files – xdhmoore Jan 22 '21 at 02:13

0 Answers0