152

I am currently using PIL.

from PIL import Image
try:
    im=Image.open(filename)
    # do stuff
except IOError:
    # filename not an image file

However, while this sufficiently covers most cases, some image files like, xcf, svg and psd are not being detected. Psd files throws an OverflowError exception.

Is there someway I could include them as well?

mohaghighat
  • 1,293
  • 17
  • 29
Sujoy
  • 8,041
  • 3
  • 30
  • 36
  • 27
    It's not particularly common practice to close duplicates across different languages. If you can't find any other Python questions with this leave it open as there could be Python-specific solutions that people want to post that did not make it to the question you posted. – Paolo Bergantino May 20 '09 at 18:09
  • yes, first of all I was really hoping for a python lib I didnt know about :P and then as ben pointed out, just the magic numbers doesnt validate the entire image. – Sujoy May 20 '09 at 18:14
  • @Sujoy, validating an entire image is nearly impossible, unless you already have a copy of it, because the computer can't tell the difference between a correct colour pixel, and a garbled set of 1s and 0s, as long as all the control (magic numbers) are correct. – DevinB May 20 '09 at 18:25
  • @devinb, agreed, i will just get the magic numbers and be done with it unless someone else comes up with something better to call for a refactor :) – Sujoy May 20 '09 at 18:31
  • xcf and psd aren't really images, they're project files that contain (often many) images... you could probably make a case for svg though. – mgalgs Jan 01 '14 at 19:10
  • PIL is able to detect image file defect/errors, but you need to do at least one image manipulation operation in order to dedect few types of errors, e.g. I applied the PIL transpose transformation. Only loading, as you suggest, sometimes fails to detect errors. Details in my answer below. – Fabiano Tarlao Oct 02 '19 at 06:31
  • Well, I do not know about the insides of psd, but I, sure, know that, as a matter of fact, svg is not an image file per se, -- it is based on xml, so it is, essentially, a plain text file. – shylent May 20 '09 at 18:03

11 Answers11

256

I have just found the builtin imghdr module. From python documentation:

The imghdr module determines the type of image contained in a file or byte stream.

This is how it works:

>>> import imghdr
>>> imghdr.what('/tmp/bass')
'gif'

Using a module is much better than reimplementing similar functionality

UPDATE: imghdr is deprecated as of python 3.11

Nadia Alramli
  • 111,714
  • 37
  • 173
  • 152
  • 5
    yes imghdr works for most image formats but not all. as per my original problem with svg, xcf and psd files, well those are undetected in imghdr as well – Sujoy May 26 '09 at 12:54
  • Yes, but instead of reinventing the wheel there is something to start with. – Nadia Alramli May 26 '09 at 13:18
  • You can for example refuse undetected image headers. If the image was not detected by imghdr is is probably not supported by PIL either. Or you can start by looking at the imghdr source code and see how it works. – Nadia Alramli May 26 '09 at 13:21
  • 2
    Your answer is actually better, thanks. Like someone above said *...but solving a problem 99% of the way is often better then not solving it at all..* – RinkyPinku Jun 03 '15 at 11:54
  • 4
    Worth to note: `imghdr.what(path)` returns `None` if given `path` is not recognized image file type. [List](https://docs.python.org/3/library/imghdr.html) of currently recognized image types: *rgb*, *gif*, *pbm*, *pgm*, *ppm*, *tiff*, *rast*, *xbm*, *jpeg*, *bmp*, *png*, *webp*, *exr*. – patryk.beza Apr 06 '16 at 15:29
  • 3
    I have found that occasionally `imghdr.what(path)` returns `None` even if the file is a valid image, particularly for jpegs. – GuillaumeDufay Jan 02 '17 at 19:32
  • 4
    Be careful! A valid hdr doesn't mean a valid image (e.g. the image bytes may have been scrambled!) – Filippo Mazza Nov 30 '17 at 13:37
  • 1
    Per @FilippoMazza 's comment, I can confirm that a bad image that got cut off during transfer can pass this test, but will break when PIL tries to read it. – kevinmicke Mar 21 '18 at 19:41
  • Just tried and it fails on many .jpg images. I think this library is bugged (as of 2019). – Logic1 Aug 14 '19 at 16:07
  • @Logic1 could you please provide some samples on internet? – Massimo Dec 12 '19 at 21:00
  • [Here is one example](https://imgur.com/a/pzKb4f7) which `imghdr` fails to check the image type (JPEG), is there anyone else who gets the same result ? By default (as of python 3.9) `imghdr.what(...)` only loads first 32 bytes of a file to buffer then test whether the pattern in the buffer matches any file type , I don't know about how all types of image headers are structured, it seems that some images may have more complex header structure which `imghdr` failed to check – Ham Jul 03 '21 at 15:38
  • this should be the accepted answer – Feline Feb 22 '22 at 20:21
55

In addition to what Brian is suggesting you could use PIL's verify method to check if the file is broken.

im.verify()

Attempts to determine if the file is broken, without actually decoding the image data. If this method finds any problems, it raises suitable exceptions. This method only works on a newly opened image; if the image has already been loaded, the result is undefined. Also, if you need to load the image after using this method, you must reopen the image file. Attributes

Two-Bit Alchemist
  • 17,966
  • 6
  • 47
  • 82
Nadia Alramli
  • 111,714
  • 37
  • 173
  • 152
  • 1
    well the main problem is that svg,xcf and psd files cannot be opened with Image.open() hence, no chance of verifying with im.verify() – Sujoy May 20 '09 at 19:07
  • 29
    My god the PIL documentation is terrible. What is exactly is a "suitable exception"? – Timmmm Jul 26 '12 at 19:45
  • Here's the link to the [Pillow documentation for Image.verify()](https://pillow.readthedocs.org/en/latest/reference/Image.html#PIL.Image.Image.verify). Unfortunately, it's no better, and it looks like they just lifted the paragraph above without adding anything. – Two-Bit Alchemist Aug 08 '14 at 18:34
  • I've seen verify raise SyntaxError for corrupt png files – Carl Nov 20 '15 at 03:41
  • is there a way to verify "WITH actually decoding the image data"? – Trevor Boyd Smith Sep 13 '17 at 14:38
  • `im.verify()` will work for some bad images, but I've found images where it won't catch them, yet functions like `im.crop()` will throw an exception. For my use case, I found it best to just wrap the `im.crop()` in a `try` block and handle the exceptions as necessary. @TrevorBoydSmith This also gets around having to open an image twice to use `im.verify()`—you just read the data, and if breaks, you handle the exception: EAFP. – kevinmicke Mar 21 '18 at 22:32
  • 6
    mmh the source code seems to verify... nothing! https://pillow.readthedocs.io/en/latest/_modules/PIL/Image.html#Image.verify – Massimo Dec 12 '19 at 21:06
  • thank you for explain that we need to reopen the image for reuse it – Rayann Nayran Oct 17 '21 at 00:49
  • pillows `im.verify()` doesn't do anything. – dreamflasher Jun 02 '22 at 14:01
  • @Massimo if you follow the source a bit further, you'll see that PngImagePlugin is the only plugin that implements a `verify` that *does* throw an exception: https://pillow.readthedocs.io/en/latest/_modules/PIL/PngImagePlugin.html. It looks like in this case, you can expect an `OSError` :) – ntjess Jun 24 '22 at 13:17
30

Additionally to the PIL image check you can also add file name extension check like this:

filename.lower().endswith(('.png', '.jpg', '.jpeg', '.tiff', '.bmp', '.gif'))

Note that this only checks if the file name has a valid image extension, it does not actually open the image to see if it's a valid image, that's why you need to use additionally PIL or one of the libraries suggested in the other answers.

tsveti_iko
  • 6,834
  • 3
  • 47
  • 39
  • 5
    What if the extensions are incorrect in the files? E.g, a text file is saved with .jpg extension or vice versa. – hafiz031 Aug 11 '20 at 04:00
  • 3
    @hafiz031 To get the actual format you can do `from PIL import Image img = Image.open(filename) print(img.format)` and then check it like this: `img.format.lower() in ['png', 'jpg', 'jpeg', 'tiff', 'bmp', 'gif']` – tsveti_iko Aug 12 '20 at 11:35
  • Unfortunately this didn't work for me. It is still identifying a corrupted image as a JPEG image. Finally I managed to handle this case in this way (I am using OpenCv): https://stackoverflow.com/a/63421847/6907424 – hafiz031 Aug 15 '20 at 02:28
16

A lot of times the first couple chars will be a magic number for various file formats. You could check for this in addition to your exception checking above.

Brian R. Bondy
  • 339,232
  • 124
  • 596
  • 636
14

One option is to use the filetype package.

Installation

python -m pip install filetype

Advantages

  1. Fast: Does its work by loading only the first few bytes of your image (check on the magic number)
  2. Supports different mime type: Images, Videos, Fonts, Audio, Archives.

Example

filetype >= 1.0.7

import filetype

filename = "/path/to/file.jpg"

if filetype.is_image(filename):
    print(f"{filename} is a valid image...")
elif filetype.is_video(filename):
    print(f"{filename} is a valid video...")

filetype <= 1.0.6

import filetype

filename = "/path/to/file.jpg"

if filetype.image(filename):
    print(f"{filename} is a valid image...")
elif filetype.video(filename):
    print(f"{filename} is a valid video...")

Additional information on the official repo: https://github.com/h2non/filetype.py

Alex Fortin
  • 2,105
  • 1
  • 18
  • 27
12

Update

I also implemented the following solution in my Python script here on GitHub.

I also verified that damaged files (jpg) frequently are not 'broken' images i.e, a damaged picture file sometimes remains a legit picture file, the original image is lost or altered but you are still able to load it with no errors. But, file truncation cause always errors.

End Update

You can use Python Pillow(PIL) module, with most image formats, to check if a file is a valid and intact image file.

In the case you aim at detecting also broken images, @Nadia Alramli correctly suggests the im.verify() method, but this does not detect all the possible image defects, e.g., im.verify does not detect truncated images (that most viewers often load with a greyed area).

Pillow is able to detect these type of defects too, but you have to apply image manipulation or image decode/recode in or to trigger the check. Finally I suggest to use this code:

from PIL import Image

try:
  im = Image.load(filename)
  im.verify() #I perform also verify, don't know if he sees other types o defects
  im.close() #reload is necessary in my case
  im = Image.load(filename) 
  im.transpose(Image.FLIP_LEFT_RIGHT)
  im.close()
except: 
  #manage excetions here

In case of image defects this code will raise an exception. Please consider that im.verify is about 100 times faster than performing the image manipulation (and I think that flip is one of the cheaper transformations). With this code you are going to verify a set of images at about 10 MBytes/sec with standard Pillow or 40 MBytes/sec with Pillow-SIMD module (modern 2.5Ghz x86_64 CPU).

For the other formats xcf,.. you can use Imagemagick wrapper Wand, the code is as follows: Check the Wand documentation: here, to installation: here

im = wand.image.Image(filename=filename)
temp = im.flip;
im.close()

But, from my experiments Wand does not detect truncated images, I think it loads lacking parts as greyed area without prompting.

I red that Imagemagick has an external command identify that could make the job, but I have not found a way to invoke that function programmatically and I have not tested this route.

I suggest to always perform a preliminary check, check the filesize to not be zero (or very small), is a very cheap idea:

import os

statfile = os.stat(filename)
filesize = statfile.st_size
if filesize == 0:
  #manage here the 'faulty image' case
Tiago Martins Peres
  • 14,289
  • 18
  • 86
  • 145
Fabiano Tarlao
  • 3,024
  • 33
  • 40
7

On Linux, you could use python-magic which uses libmagic to identify file formats.

AFAIK, libmagic looks into the file and tries to tell you more about it than just the format, like bitmap dimensions, format version etc.. So you might see this as a superficial test for "validity".

For other definitions of "valid" you might have to write your own tests.

Ham
  • 703
  • 8
  • 17
fmarc
  • 1,706
  • 15
  • 20
6

You could use the Python bindings to libmagic, python-magic and then check the mime types. This won't tell you if the files are corrupted or intact but it should be able to determine what type of image it is.

Kamil Kisiel
  • 19,723
  • 11
  • 46
  • 56
1

Adapting from Fabiano and Tiago's answer.

from PIL import Image

def check_img(filename):
    try:
        im = Image.open(filename)
        im.verify()
        im.close()
        im = Image.open(filename) 
        im.transpose(Image.FLIP_LEFT_RIGHT)
        im.close()
        return True
    except: 
        print(filename,'corrupted')
        return False

if not check_img('/dir/image'):
    print('do something')
durranaik
  • 21
  • 2
-1

Extension of the image can be used to check image file as follows.

import os
for f in os.listdir(folderPath):
    if (".jpg" in f) or (".bmp" in f):
        filePath = os.path.join(folderPath, f)
 
-3
format = [".jpg",".png",".jpeg"]
 for (path,dirs,files) in os.walk(path):
     for file in files:
         if file.endswith(tuple(format)):
             print(path)
             print ("Valid",file)
         else:
             print(path)
             print("InValid",file)
  • 1
    Your code has some indentation issues and won't run properly. Also, consider adding some explanations as to why and how your code solves the problem. Code-only answers by not be so helpful for future readers coming here. – Tomerikoo Feb 28 '20 at 12:43
  • Here we have used Agrparser method. – rObinradOO Mar 05 '20 at 12:31