How can I catch corrupt JPEGs when loading an image with imread() in python?

Question

I`m talking about errors like "Corrupt JPEG data: bad Huffman code" or "Corrupt JPEG data: premature end of data segment". Currently, those count as some kind of warnings, not as errors and printed to stderr. But I actually need to catch those errors. Some guy on stackoverflow provided a solution by changing the source code of OpenCV, but I can't figure it out how to make it with Python. He even created a pull request on github, but it was declined.

What I actually need is just to check jpg, jpeg, png files for integrity and corruption, which imread() provides, but I can't extract any useful data from the cv2 just printing it to stderr.

My current code:

from os import listdir
import cv2
import os
    
dir = './input/'
dest = './output/'
if not os.path.exists(dest):
        os.makedirs(dest)
        print('Output directory created')
for filename in listdir(dir):
        try:
            print(filename)
            img = cv2.imread(dir+filename)
        except cv2.error as e: print('Error: ' + str(e))

It is just prints the filename and if any integrity or corruption happens, cv2 prints it to stderr. I also tried to just redirect the output to another file like this:

python main.py >> output.txt

And then somehow parse it, but I can only see my print, not that one from imread().

So, is there any solution to this problem? Because I tried a lot of corruption checkers, like Pillow verify() method or integv library. But only this one can properly see all my corrupted files.

Did you do any research? https://stackoverflow.com/q/74999718/2836621 — Mark Setchell, Jan 04 '23 at 21:07
there *are* ways to catch some library call's stdout/stderr output using Python (this is unrelated to OpenCV). you aren't asking about that though. you are asking for a library that tells you what's wrong with the image file (also unrelated to OpenCV). — Christoph Rackwitz, Jan 05 '23 at 08:54
@christoph-rackwitz, which are those ways? Firstly, I'm asking for a way to catch this stderr. If it is not possible, then I would like to see libraries that provide exact to OpenCV functionality with catching-possible. Because that output to stderr does exactly what I need. Tells if image has some problems with its structure. And currently, you can't just make an if statement for those. — Bohdan, Jan 05 '23 at 15:34

Gowthaman · Answer 1 · 2023-01-05T09:47:40.800

0

imread() is used to load an image from the specified file. It returns a numpy.ndarray (NumPy N-dimensional array) after loading the image successfully. For Corrupted Images, imread can't load properly, so won't helps here.

Pillow verify() raises an exception if there is a problem with the image and it can load image generous upto 85 Mpixels(this can be modified by Image.MAX_IMAGE_PIXELS)

from os import listdir
from PIL import Image
   
for filename in listdir('./'):
  if filename.endswith('.jpg'):
    try:
      I = Image.open('./'+filename) # open the image file
      I.verify() # verify that it is, in fact an image
    except (IOError, SyntaxError) as e:
      print('Bad file:', filename) # print out the names of corrupt files

edited Jan 05 '23 at 09:47

answered Jan 05 '23 at 07:33

Gowthaman

137
1
10

does your answer require the use of "base64"? if it does not, simplify. – Christoph Rackwitz Jan 05 '23 at 08:51
StringIO and BytesIO will helps if image contains some different format data or unsupported characters in the files. – Gowthaman Jan 05 '23 at 09:33
You don't need BytesIO constructed **from an open() call**. just pass the file object itself. -- StringIO for binary data (result of b64decode) is a terrible idea. -- all of that is beside the point of using pillow's `Image.verify()`, which requires an Image instance. The creation of that Image instance is secondary. – Christoph Rackwitz Jan 05 '23 at 09:37
You could try to verify [that](https://ibb.co/Pt0Y0HX) image. Method with `Image.verify()` does not work for it. But you can clearly tell it's corrupted. And `imread()` loads it correctly, tells what's wrong with, and return a numpy.ndarra as expected. You just cannot interact with stderr, where those "what's wrong" sits. – Bohdan Jan 05 '23 at 15:40
that file isn't corrupted. that file is a re-encoding of a corrupted file. – Christoph Rackwitz Jan 05 '23 at 17:06

How can I catch corrupt JPEGs when loading an image with imread() in python?

1 Answers1