I wish to create a pipeline to provide non-standard files to the neural network (for example with extension *.xxx). Currently I have structured my code as follows:
1) I define a list of paths where to find training files
2) I define an instance of the tf.data.Dataset object containing these paths
3) I map to the Dataset a python function that takes each path and returns the associated numpy array (loaded from the folder on the pc); this array is a matrix with dimensions [256, 256, 192].
4) I define an initializable iterator and then use it during network training.
My doubt lies in the size of the batch I provide to the network. I would like to have batches of size 64 supplied to the network. How could I do? For example, if I use the function train_data.batch(b_size) with b_size = 1 the result is that when iterated, the iterator gives one element of shape [256, 256, 192]; what if I wanted to feed the neural net with just 64 slices of this array?
This is an extract of my code:
with tf.name_scope('data'):
train_filenames = tf.constant(list_of_files_train)
train_data = tf.data.Dataset.from_tensor_slices(train_filenames)
train_data = train_data.map(lambda filename: tf.py_func(
self._parse_xxx_data, [filename], [tf.float32]))
train_data.shuffle(buffer_size=len(list_of_files_train))
train_data.batch(b_size)
iterator = tf.data.Iterator.from_structure(train_data.output_types, train_data.output_shapes)
input_data = iterator.get_next()
train_init = iterator.make_initializer(train_data)
[...]
with tf.Session() as sess:
sess.run(train_init)
_ = sess.run([self.train_op])
Thanks in advance
----------
I posted a solution to my problem in the comments below. I would still be happy to receive any comment or suggestion on possible improvements. Thank you ;)