3

Is there anything comparable to the tf.FixedLengthRecordReader only with the difference that the data is loaded from a tensor instead of a file? I try to build an input pipeline that looks like this (My problem is described under point 4.):

1. Load data in dictionaties

...
# Each dictionary contains two 'key/value' pairs:
# [b' images'] / List_of_Arrays
# [b' labels'] / List_of_Integers
dict_1 = unpickle(path_1)
dict_2 = unpickle(path_2)
...
dict_n = unpickle(path_n)

2. Create new dictionary

# Select certain individual data points from the N dictionaries 
# and merge them into a new dictionary or array.
..
dict_new = ...
....

3. Create a tensor with training data points

class PRELOADdata(object):
  pass
pre_load_data = PRELOADdata()

# Images
dict_value_img = dict_new[b'images']
array_image = np.asarray(dict_value_img, np.float32)
pre_load_data.images = tf.convert_to_tensor(array_image, np.float32)

#Labels
dict_value_lbl = dict_new[b'labels']
array_label = np.asarray(dict_value_lbl, np.float32)
pre_load_data.labels = tf.convert_to_tensor(array_label, np.float32)
...
retun pre_load_data

4. Here i need help :)

At this point I would like to use database similar to a file which is read with the function read() from tf.FixedLengthRecordReader. In my current solution, the whole data set is packed in a batch.

class DATABASERecord(object):
  pass
result = DATABASERecord()

database = get_pre_load_data()
... ???
result.image = ..
result.label = ..

return result

5. Do some operations on the 'result'

data_point = get_result()
label = data_point.label
image = tf.cast(data_point.image, tf.int32)
#... tf.random_crop, tf.image.random_flip_left_right, 
#... tf.image.random_brightness, tf.image.random_contrast,
#... tf.image.per_image_standardization
...

6. Create Batch, QUEUE ..

...
image_batch, label_batch = tf.train.batch([image, label],
batch_size=BATCH_SIZE,num_threads=THREADS, capacity=BA_CAPACITY * BATCH_SIZE)
...
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
[image_batch, label_batch], capacity=QU_CAPACITY)
...
..batch_queue.dequeue()
...
tf.train.start_queue_runners(sess=my_sess)

I don't know if it's relevant but the whole thing runs as a multi GPU system

EDIT:

I don't have an answer to the question yet, but I have a solution for the problem that should solve the answer to my question. So instead of saving the datapoints in a tensor, I save them in a binary file and load them with tf.FixedLengthRecordReader. This answer has helped me a lot: Attach a queue to a numpy array in tensorflow for data fetch instead of files?

Milan
  • 256
  • 2
  • 17

0 Answers0