I have a very bug data set for training.
I'm using the data set API like so:
self._dataset = tf.contrib.data.Dataset.from_tensor_slices((self._images_list, self._labels_list))
self._dataset = self._dataset.map(self.load_image)
self._dataset = self._dataset.batch(batch_size)
self._dataset = self._dataset.shuffle(buffer_size=shuffle_buffer_size)
self._dataset = self._dataset.repeat()
self._iterator = self._dataset.make_one_shot_iterator()
If I use for the training a small amount of the data then all is well. If I use all my data then TensorFlow will crash with this error: ValueError: GraphDef cannot be larger than 2GB.
It seems like TensorFlow tries to load all the data instead of loading only the data that it needs... not sure...
Any advice will be great!
Update... found a solution/workaround
according to this post: Tensorflow Dataset API doubles graph protobuff filesize
I replaced the make_one_shot_iterator() with make_initializable_iterator() and of course called the iterator initializer after creating the session:
init = tf.global_variables_initializer()
sess.run(init)
sess.run(train_data._iterator.initializer)
But I'm leaving the question open as to me it seems like a workaround and not a solution...