Is there anything comparable to the tf.FixedLengthRecordReader
only with the difference that the data is loaded from a tensor instead of a file? I try to build an input pipeline that looks like this (My problem is described under point 4.):
1. Load data in dictionaties
...
# Each dictionary contains two 'key/value' pairs:
# [b' images'] / List_of_Arrays
# [b' labels'] / List_of_Integers
dict_1 = unpickle(path_1)
dict_2 = unpickle(path_2)
...
dict_n = unpickle(path_n)
2. Create new dictionary
# Select certain individual data points from the N dictionaries
# and merge them into a new dictionary or array.
..
dict_new = ...
....
3. Create a tensor with training data points
class PRELOADdata(object):
pass
pre_load_data = PRELOADdata()
# Images
dict_value_img = dict_new[b'images']
array_image = np.asarray(dict_value_img, np.float32)
pre_load_data.images = tf.convert_to_tensor(array_image, np.float32)
#Labels
dict_value_lbl = dict_new[b'labels']
array_label = np.asarray(dict_value_lbl, np.float32)
pre_load_data.labels = tf.convert_to_tensor(array_label, np.float32)
...
retun pre_load_data
4. Here i need help :)
At this point I would like to use database
similar to a file which is read with the function read()
from tf.FixedLengthRecordReader
. In my current solution, the whole data set is packed in a batch.
class DATABASERecord(object):
pass
result = DATABASERecord()
database = get_pre_load_data()
... ???
result.image = ..
result.label = ..
return result
5. Do some operations on the 'result'
data_point = get_result()
label = data_point.label
image = tf.cast(data_point.image, tf.int32)
#... tf.random_crop, tf.image.random_flip_left_right,
#... tf.image.random_brightness, tf.image.random_contrast,
#... tf.image.per_image_standardization
...
6. Create Batch, QUEUE ..
...
image_batch, label_batch = tf.train.batch([image, label],
batch_size=BATCH_SIZE,num_threads=THREADS, capacity=BA_CAPACITY * BATCH_SIZE)
...
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
[image_batch, label_batch], capacity=QU_CAPACITY)
...
..batch_queue.dequeue()
...
tf.train.start_queue_runners(sess=my_sess)
I don't know if it's relevant but the whole thing runs as a multi GPU system
EDIT:
I don't have an answer to the question yet, but I have a solution for the problem that should solve the answer to my question. So instead of saving the datapoints in a tensor, I save them in a binary file and load them with tf.FixedLengthRecordReader
. This answer has helped me a lot: Attach a queue to a numpy array in tensorflow for data fetch instead of files?