Keras image_dataset_from_directory
inside the preprocessing
module takes a path as an argument and automatically infers the classes when those images are stored in separate subfolders. In my case, however, I have a single folder and image classes are then specified in a DataFrame.
.
├── datasets
│ ├── sample_submit.csv
│ ├── test_images
│ │ ├── test_0000.jpg
│ │ ├── test_0001.jpg
│ │ ├── test_0002.jpg
│ │ └── ...
│ ├── test_images.csv
│ ├── train_images
│ │ ├── train_0000.jpg
│ │ ├── train_0001.jpg
│ │ ├── train_0002.jpg
│ │ └── ...
│ └── train_images.csv
└── model.py
Tensorflow's documentation specifies that when you are not inferring the labels, a list or tuple must be specified, which I get from the DataFrame df
. However, when I specify the image folder, TensorFlow returns a ValueError
because it has found no images:
In [1]: df = pd.read_csv('datasets/train_images.csv')
...: tds = keras.preprocessing\
...: .image_dataset_from_directory('datasets/train_images', list(df['class']),
...: validation_split=0.2, subset='training',
...: seed=123, image_size(180, 180))
ValueError: Expected the lengths of `labels` to match the number of files in the target directory. len(labels) is 1102 while we found 0 files in datasets/train_images.
Why does keras not recognise the images within the folder? I have tried setting the "full" relative path with ./datasets/train_images
, adding a slash with datasets/train_images/
and also the absolute path, to no avail. What is missing here? Alternatively, is there a more efficient approach in this case where I can still get the train/test split?
EDIT: It seems there is a limitation with keras and this question originally laid it out, but remained too vague to get to the heart of the matter.
Plain and clear: keras seems to always scrape the subfolders of the directory
argument for images and build the dataset. The workaround to enable the loading of images is to wrap an additional folder (outer_train
) and pass it to directory
.
However, I still have problems with this approach, because now keras seems unable to take the custom classes passed as a list and outputs Found 1102 files belonging to 1 classes.
(in this case, the name of the now subfolder train_images
), so any help is still appreciated.