0

I've skimmed over all tensorflow tutorials in which all data sets were loaded in RAM due to their small size. However, my own data (~30 Gb of images) can not be loaded in memory, therefore I'm looking for effective ways of reading images for further processing. Could anyone provide me examples of how can I do that?

P.S. I have two files train_images and validation_images that contain:

<path/to/img> <label>

0x1337
  • 1,074
  • 1
  • 14
  • 33

3 Answers3

2

This is what you're looking for: Tensorflow read images with labels

The exact code snippet is like this:

def read_labeled_image_list(image_list_file):
    """Reads a .txt file containing pathes and labeles
    Args:
       image_list_file: a .txt file with one /path/to/image per line
       label: optionally, if set label will be pasted after each line
    Returns:
       List with all filenames in file image_list_file
    """
    f = open(image_list_file, 'r')
    filenames = []
    labels = []
    for line in f:
        filename, label = line[:-1].split(' ')
        filenames.append(filename)
        labels.append(int(label))
    return filenames, labels

def read_images_from_disk(input_queue):
    """Consumes a single filename and label as a ' '-delimited string.
    Args:
      filename_and_label_tensor: A scalar string tensor.
    Returns:
      Two tensors: the decoded image, and the string label.
    """
    label = input_queue[1]
    file_contents = tf.read_file(input_queue[0])
    example = tf.image.decode_png(file_contents, channels=3)
    return example, label

# Reads pfathes of images together with their labels
image_list, label_list = read_labeled_image_list(filename)

images = ops.convert_to_tensor(image_list, dtype=dtypes.string)
labels = ops.convert_to_tensor(label_list, dtype=dtypes.int32)

# Makes an input queue
input_queue = tf.train.slice_input_producer([images, labels],
                                            num_epochs=num_epochs,
                                            shuffle=True)

image, label = read_images_from_disk(input_queue, num_labels=num_labels)

# Optional Preprocessing or Data Augmentation
# tf.image implements most of the standard image augmentation
image = preprocess_image(image)
label = preprocess_label(label)

# Optional Image and Label Batching
image_batch, label_batch = tf.train.batch([image, label],
                                          batch_size=batch_size)
Community
  • 1
  • 1
muneeb
  • 142
  • 13
0

Tutorial on udacity has stochastic method explained in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/4_convolutions.ipynb, you can use the same with one change, instead of saving all images in single pickle file, save them in chunks of batch_size that you are using. That way at a time, you can load only as much data as used in the one batch.

Ashish Awasthi
  • 1,302
  • 11
  • 23
0

The recommended way is to put it into sharded protobuf files, where encoded jpeg and label(s) are features of a tf.Example. build_image_data.py in the tensorflow/models repository shows how to create such a database of image/label pairs from a directory structure, you'll need to adapt it a bit to your case (it's straightforward). Then for training time you can look at image_processing.py where it shows how to go from the tf.Example proto to image/label tensors (extract decoded jpg and label from the Example record, decode jpg, resize, apply augmentations as needed, then enqueue).

etarion
  • 16,935
  • 4
  • 43
  • 66