Loading folders of images in tensorflow

Question

I'm new to tensorflow, but i already followed and executed the tutorials they promote and many others all over the web. I made a little convolutional neural network over the MNIST images. Nothing special, but i would like to test on my own images. Now my problem comes: I created several folders; the name of each folder is the class (label) the images inside belong.

The images have different shapes; i mean they have no fixed size.

How can i load them for using with Tensorflow?

I followed many tutorials and answers both here on StackOverflow and on others Q/A sites. But still, i did not figure out how to do this.

DomJack · Answer 1 · 2019-05-09T00:57:58.277

The tf.data API (tensorflow 1.4 onwards) is great for things like this. The pipeline will looks something like the following:

Create an initial tf.data.Dataset object that iterates over all examples
(if training) shuffle/repeat the dataset;
map it through some function that makes all images the same size;
batch;
(optionall) prefetch to tell your program to collect the preprocess subsequent batches of data while the network is processing the current batch; and
and get inputs.

There are a number of ways of creating your initial dataset (see here for a more in depth answer)

TFRecords with Tensorflow Datasets

Supporting tensorflow version 1.12 onwards, Tensorflow datasets provides a relatively straight-forward API for creating tfrecord datasets, and also handles data downloading, sharding, statistics generation and other functionality automatically.

See e.g. this image classification dataset implementation. There's a lot of bookeeping stuff in there (download urls, citations etc), but the technical part boils down to specifying features and writing a _generate_examples function

features = tfds.features.FeaturesDict({
            "image": tfds.features.Image(shape=(_TILES_SIZE,) * 2 + (3,)),
            "label": tfds.features.ClassLabel(
                names=_CLASS_NAMES),
            "filename": tfds.features.Text(),
        })

...

def _generate_examples(self, root_dir):
  root_dir = os.path.join(root_dir, _TILES_SUBDIR)
  for i, class_name in enumerate(_CLASS_NAMES):
    class_dir = os.path.join(root_dir, _class_subdir(i, class_name))
    fns = tf.io.gfile.listdir(class_dir)

    for fn in sorted(fns):
      image = _load_tif(os.path.join(class_dir, fn))
      yield {
          "image": image,
          "label": class_name,
          "filename": fn,
      }

You can also generate the tfrecords using lower level operations.

Load images via `tf.data.Dataset.map` and `tf.py_func(tion)`

Alternatively you can load the image files from filenames inside tf.data.Dataset.map as below.

image_paths, labels = load_base_data(...)
epoch_size = len(image_paths)
image_paths = tf.convert_to_tensor(image_paths, dtype=tf.string)
labels = tf.convert_to_tensor(labels)

dataset = tf.data.Dataset.from_tensor_slices((image_paths, labels))

if mode == 'train':
    dataset = dataset.repeat().shuffle(epoch_size)


def map_fn(path, label):
    # path/label represent values for a single example
    image = tf.image.decode_jpeg(tf.read_file(path))

    # some mapping to constant size - be careful with distorting aspec ratios
    image = tf.image.resize_images(out_shape)
    # color normalization - just an example
    image = tf.to_float(image) * (2. / 255) - 1
    return image, label


# num_parallel_calls > 1 induces intra-batch shuffling
dataset = dataset.map(map_fn, num_parallel_calls=8)
dataset = dataset.batch(batch_size)
# try one of the following
dataset = dataset.prefetch(1)
# dataset = dataset.apply(
#            tf.contrib.data.prefetch_to_device('/gpu:0'))

images, labels = dataset.make_one_shot_iterator().get_next()

I've never worked in a distributed environment, but I've never noticed a performance hit from using this approach over tfrecords. If you need more custom loading functions, also check out tf.py_func.

More general information here, and notes on performance here

For those getting `NameError: name 'load_base_data' is not defined`: I guess `load_base_data(...)` can be replaced with something like `["mydata/cats", "mydata/dogs"], [0, 1]`. — Nicolas Raoul, Sep 20 '18 at 05:40
For those getting `NameError: global name 'out_shape' is not defined` and other errors, I guess it might work better after adding these lines at the beginning of the file: `import tensorflow as tf`, `mode = 'train'`, `out_shape = tf.convert_to_tensor([100, 100])`, `batch_size = 10`. Not sure whether these values make sense or not, though. — Nicolas Raoul, Sep 20 '18 at 05:47

score 3 · Answer 2 · edited Sep 20 '18 at 04:22

Sample input pipeline script to load images and labels from directory. You could do preprocessing(resizing images etc.,) after this.

import tensorflow as tf
filename_queue = tf.train.string_input_producer(
tf.train.match_filenames_once("/home/xxx/Desktop/stackoverflow/images/*/*.png"))

image_reader = tf.WholeFileReader()
key, image_file = image_reader.read(filename_queue)
S = tf.string_split([key],'/')
length = tf.cast(S.dense_shape[1],tf.int32)
# adjust constant value corresponding to your paths if you face issues. It should work for above format.
label = S.values[length-tf.constant(2,dtype=tf.int32)]
label = tf.string_to_number(label,out_type=tf.int32)
image = tf.image.decode_png(image_file)

# Start a new session to show example output.
with tf.Session() as sess:
    # Required to get the filename matching to run.
    tf.initialize_all_variables().run()

    # Coordinate the loading of image files.
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    for i in xrange(6):
        # Get an image tensor and print its value.
        key_val,label_val,image_tensor = sess.run([key,label,image])
        print(image_tensor.shape)
        print(key_val)
        print(label_val)


    # Finish off the filename queue coordinator.
    coord.request_stop()
    coord.join(threads)

File Directory

./images/1/1.png
./images/1/2.png
./images/3/1.png
./images/3/2.png
./images/2/1.png
./images/2/2.png

Output:

 (881, 2079, 3)
 /home/xxxx/Desktop/stackoverflow/images/3/1.png
 3
 (155, 2552, 3)
 /home/xxxx/Desktop/stackoverflow/images/2/1.png
 2
 (562, 1978, 3)
 /home/xxxx/Desktop/stackoverflow/images/3/2.png
 3
 (291, 2558, 3)
 /home/xxxx/Desktop/stackoverflow/images/1/1.png
 1
 (157, 2554, 3)
 /home/xxxx/Desktop/stackoverflow/images/1/2.png
 1
 (866, 936, 3)
 /home/xxxx/Desktop/stackoverflow/images/2/2.png
 2

First of all, thanks for the quick reply. I tried your code snippet and it raises the following error. tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_0_input_producer' is closed and has insufficient elements (requested 1, current size 0) [[Node: ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](WholeFileReaderV2, input_producer)]] — SilvioBarra, Jun 07 '17 at 15:48
I think its not able to find images. Is the path to folders correct ? try it with few images. — Harsha Pokkalla, Jun 07 '17 at 17:37
I was able to fix the insufficient elements error with the following two lines of code: `sess.run(tf.local_variables_initializer())` and `sess.run(tf.global_variables_initializer())` — Timo Denk, Oct 02 '17 at 11:48
@pnz: Just before the `# Required to get the filename matching to run.` line, with 4 spaces on the left. — Nicolas Raoul, Sep 20 '18 at 04:47

score 1 · Answer 3 · answered Jun 16 '21 at 18:43

1

For loading images of equal size just use this:

tf.keras.preprocessing.image_dataset_from_directory(dir)

docs: https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image_dataset_from_directory

answered Jun 16 '21 at 18:43

drGabriel

548
6
5

score 0 · Answer 4 · answered Jan 02 '23 at 15:28

To load images with different shapes , tf provides a pipeline implementation (ImageGenerator):

from tensorflow.keras.preprocessing.image import ImageDataGenerator

TARGET_SHAPE = (500,500)
BATCH_SIZE = 32
train_dir = "train_images_directory" #ex: images/train/
test_dir = "train_images_directory" #ex: images/test/

train_images_generator = ImageDataGenerator(rescale=1.0/255,)
train_data_gen = 
               image_train_gen.flow_from_directory(batch_size=BATCH_SIZE,
                                           directory=train_dir,
                                           target_size=TARGET_SHAPE,
                                           shuffle=True,
                                           class_mode='sparse')

# do the same for validation and test dataset
# 1- image_generator 2- load images from directory with target shape

Loading folders of images in tensorflow

4 Answers4

TFRecords with Tensorflow Datasets

Load images via `tf.data.Dataset.map` and `tf.py_func(tion)`

Linked

Loading folders of images in tensorflow

4 Answers4

TFRecords with Tensorflow Datasets

Load images via tf.data.Dataset.map and tf.py_func(tion)

Linked

Load images via `tf.data.Dataset.map` and `tf.py_func(tion)`