Effective reading of own images in tensorflow

Question

I've skimmed over all tensorflow tutorials in which all data sets were loaded in RAM due to their small size. However, my own data (~30 Gb of images) can not be loaded in memory, therefore I'm looking for effective ways of reading images for further processing. Could anyone provide me examples of how can I do that?

P.S. I have two files train_images and validation_images that contain:

<path/to/img> <label>

use method 1 or 2 from here: https://www.tensorflow.org/versions/r0.7/how_tos/reading_data/index.html#reading-data — Yaroslav Bulatov, Apr 06 '16 at 12:31

score 2 · Accepted Answer · edited May 23 '17 at 12:09

This is what you're looking for: Tensorflow read images with labels

The exact code snippet is like this:

def read_labeled_image_list(image_list_file):
    """Reads a .txt file containing pathes and labeles
    Args:
       image_list_file: a .txt file with one /path/to/image per line
       label: optionally, if set label will be pasted after each line
    Returns:
       List with all filenames in file image_list_file
    """
    f = open(image_list_file, 'r')
    filenames = []
    labels = []
    for line in f:
        filename, label = line[:-1].split(' ')
        filenames.append(filename)
        labels.append(int(label))
    return filenames, labels

def read_images_from_disk(input_queue):
    """Consumes a single filename and label as a ' '-delimited string.
    Args:
      filename_and_label_tensor: A scalar string tensor.
    Returns:
      Two tensors: the decoded image, and the string label.
    """
    label = input_queue[1]
    file_contents = tf.read_file(input_queue[0])
    example = tf.image.decode_png(file_contents, channels=3)
    return example, label

# Reads pfathes of images together with their labels
image_list, label_list = read_labeled_image_list(filename)

images = ops.convert_to_tensor(image_list, dtype=dtypes.string)
labels = ops.convert_to_tensor(label_list, dtype=dtypes.int32)

# Makes an input queue
input_queue = tf.train.slice_input_producer([images, labels],
                                            num_epochs=num_epochs,
                                            shuffle=True)

image, label = read_images_from_disk(input_queue, num_labels=num_labels)

# Optional Preprocessing or Data Augmentation
# tf.image implements most of the standard image augmentation
image = preprocess_image(image)
label = preprocess_label(label)

# Optional Image and Label Batching
image_batch, label_batch = tf.train.batch([image, label],
                                          batch_size=batch_size)

Ashish Awasthi · Answer 2 · 2016-04-10T03:16:14.280

0

Tutorial on udacity has stochastic method explained in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/4_convolutions.ipynb, you can use the same with one change, instead of saving all images in single pickle file, save them in chunks of batch_size that you are using. That way at a time, you can load only as much data as used in the one batch.

edited Apr 10 '16 at 03:16

answered Apr 06 '16 at 11:48

Ashish Awasthi

1,302
11
23

score 0 · Answer 3 · answered Apr 06 '16 at 12:31

The recommended way is to put it into sharded protobuf files, where encoded jpeg and label(s) are features of a tf.Example. build_image_data.py in the tensorflow/models repository shows how to create such a database of image/label pairs from a directory structure, you'll need to adapt it a bit to your case (it's straightforward). Then for training time you can look at image_processing.py where it shows how to go from the tf.Example proto to image/label tensors (extract decoded jpg and label from the Example record, decode jpg, resize, apply augmentations as needed, then enqueue).

Effective reading of own images in tensorflow

3 Answers3