UnidentifiedImageError when training a model using TF ImageGenerator

Question

I'm running a binary classifier of 21250 images (total for the 2 classes). My batch size is at 425 with steps at 50.

When I run the model I get the following error:

UnknownError: 2 root error(s) found.
  (0) Unknown:  UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x0000019FA183C8B0>
Traceback (most recent call last):

From what I understand, the image here could be corrupt or the image cannot be read for other reasons. Is there a way to get the trainer to skip images that are not identifiable?

Please do let me know other suggestions to consider in my code/ data to fix this problem.

It could be that some of the images are of a 'None' type instead of a jpeg or png. — yudhiesh, Oct 26 '20 at 03:50
@yudhiesh - is there a way of setting all images to be in jpeg format without manually trying to find it and do it? — Ossz, Oct 26 '20 at 03:53
So if the images are in 'NoneType' form I do not think it is possible to convert them to JPEG. You would have to remove them. — yudhiesh, Oct 26 '20 at 03:55

score 2 · Accepted Answer · answered Oct 26 '20 at 04:10

2

This error could be happening due to the images being of 'NoneType' where although you may see them being of .jpeg or .png the image was actually corrupted somehow during preprocessing the image. On large datasets I have faced this issue numerous times.

What you can do is remove these images as I do not think it is possible to convert them to the desired format.

Bear in mind to keep a copy of the entire dataset before removing the images just in case anything goes wrong in the code.

I do not know the structure of your image folder so I will just show you how to do it with the full path to the image already found. This is something you would have to do but can be easily done with os.walk() and then combining the returned values of root and the files with image_path = os.path.join(root, files) to get the full path to an image. Loop over all the images as you do this to apply it to all the images.

import cv2
import imghdr
import os


image = cv2.imread(image_path)
img_type = imghdr.what(image_path)
if img_type != "jpeg":
    os.remove(image_path)

answered Oct 26 '20 at 04:10

yudhiesh

6,383
3
16
49

thanks and I am assuming I should I run this at the beginning of my code after loading the file folder in? – Ossz Oct 26 '20 at 04:18
The module cv2 does not exist is this a python 2 module? I'm running off the Python 3 in Jupyter notebook... – Ossz Oct 26 '20 at 04:21
Yes use ```os.walk()``` to get all the files, roots and directories. You can get a sample of the full path by going into the image and seeing its path. Then use ```os.join()``` on roots and files but print what each one returns first to double check. This code will have to run on each of the full paths of each image. – yudhiesh Oct 26 '20 at 04:24
cv2 has to be installed then use this ```!pip3 install opencv-python ``` – yudhiesh Oct 26 '20 at 04:26
interestingly I installed the package but I am still getting the error 'No module named 'cv2'' - why do you think this is? – Ossz Oct 26 '20 at 04:43
[Try this](https://stackoverflow.com/questions/38109270/cv2-import-error-on-jupyter-notebook) – yudhiesh Oct 26 '20 at 04:46
1

I successfully implemented your code to remove the corrupt/ non-type files. It works! I observed the file number in directories falling and no longer getting the error when training the dataset. Many thanks! – Ossz Oct 26 '20 at 06:45

UnidentifiedImageError when training a model using TF ImageGenerator

1 Answers1

Linked