I am reading the code in TensorFlow benchmarks repo. The following piece of code is the part that creates TensorFlow dataset from TFRecord files:
ds = tf.data.TFRecordDataset.list_files(tfrecord_file_names)
ds = ds.apply(interleave_ops.parallel_interleave(tf.data.TFRecordDataset, cycle_length=10))
I am trying to change this code to create dataset directly from JPEG image files:
ds = tf.data.Dataset.from_tensor_slices(jpeg_file_names)
ds = ds.apply(interleave_ops.parallel_interleave(?, cycle_length=10))
I don't know what to write in the ? place. The map_func in parallel_interleave() is __init__() of tf.data.TFRecordDataset class for TFRecord files, but I don't know what to write for JPEG files.
We don't need to do any transformations here. Because we will zip two datasets and then do the transformations later. The code is as follows:
counter = tf.data.Dataset.range(batch_size)
ds = tf.data.Dataset.zip((ds, counter))
ds = ds.apply( \
batching.map_and_batch( \
map_func=preprocess_fn, \
batch_size=batch_size, \
num_parallel_batches=num_splits))
Because we don't need transformation in ? place, I tried to use an empty map_func, but there is error "map_funcmust return a
Dataset` object". I also tried to use tf.data.Dataset, but the output says Dataset is an abstract class that is not allowed to put there.
Anyone can help this? Thanks very much.