check if a URL to an image is up and exists in Python

Question

I am making a website. I want to check from the server whether the link that the user submitted is actually an image that exists.

score 19 · Answer 1 · answered Feb 21 '18 at 15:31

19

This is the best approach working for my application, based also on previous comments:

def is_url_image(image_url):
   image_formats = ("image/png", "image/jpeg", "image/jpg")
   r = requests.head(image_url)
   if r.headers["content-type"] in image_formats:
      return True
   return False

answered Feb 21 '18 at 15:31

Kraviz

511
4
5

Guessing the MIME type via the request headers is a much better way of doing it, but I would beware of using a `HEAD` request, I've heard that some websites do not function correctly whereas a `GET` request may serve better. Although, the case I am referencing to had to do with the `content-size` header, not the `content-type` header, so who knows. – Xevion Apr 03 '20 at 14:19
2

For two different URLs, one an image and one a non-image, `r.headers["content-type"]` = "text/html; charset=iso-8859-1". i.e. This function returns False regardless. Probing deeper, the reason seems to be that my "image" URL actually redirects to a new URL where the image exists, which is seamless in the browser and when downloading, but the header only comes back as an image if you manually trace the redirects to find the "final" URL where the image "really" lives. Using that URL, the routine returns True. So...use this routine with caution: it returns False more than one may find necessary. – sh37211 Jun 17 '21 at 14:55

score 13 · Answer 2 · edited Sep 18 '19 at 20:33

13

This is one way that is quick:

It doesn't really verify that is really an image file, it just guesses based on file extention and then checks that the url exists. If you really need to verify that the data returned from the url is actually an image (for security reasons) then this solution would not work.

import mimetypes, urllib2

def is_url_image(url):    
    mimetype,encoding = mimetypes.guess_type(url)
    return (mimetype and mimetype.startswith('image'))

def check_url(url):
    """Returns True if the url returns a response code between 200-300,
       otherwise return False.
    """
    try:
        headers = {
            "Range": "bytes=0-10",
            "User-Agent": "MyTestAgent",
            "Accept": "*/*"
        }

        req = urllib2.Request(url, headers=headers)
        response = urllib2.urlopen(req)
        return response.code in range(200, 209)
    except Exception:
        return False

def is_image_and_ready(url):
    return is_url_image(url) and check_url(url)

edited Sep 18 '19 at 20:33

Rob

3,333
5
28
71

answered May 11 '12 at 00:40

MattoTodd

14,467
16
59
76

1

a HEAD request could probably do, too. – 9000 May 11 '12 at 00:44
1

I have found more sites/servers support the `Range` header than will respond to a `HEAD` request, even though thats what a head request is for. – MattoTodd May 11 '12 at 00:46
Curious. Is range `0-10` arbitrary? Could you, for example, request `0-0`? Seems to be valid to do so: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.1 – Mattie May 11 '12 at 01:40
I think the import statement should say 'mimetypes'. – Michiel Kauw-A-Tjoe Dec 19 '13 at 12:23
In case of parameters in the given URL: `import mimetypes def is_url_image(url): mimetype,encoding = mimetypes.guess_type(url.split("?")[0]) return (mimetype and mimetype.startswith('image'))` – benjamin berhault Nov 29 '20 at 10:45

score 4 · Answer 3 · answered Feb 15 '18 at 04:52

You can read the header of the http request, it contains some meta-data like the content-type.

On python 3:

from urllib.request import urlopen
image_formats = ("image/png", "image/jpeg", "image/gif")
url = "http://localhost/img.png"
site = urlopen(url)
meta = site.info()  # get header of the http request
if meta["content-type"] in image_formats:  # check if the content-type is a image
    print("it is an image")

You can also get other info like the size of the image and etc. The good news about this is that it doesn't download the image. It could fail if the header says that it is an image and it is not, but you can still do a last check and download the image if it pass the first filter.

score 1 · Answer 4 · answered May 11 '12 at 00:51

1

Take a look into imghdr

Here is some example code:

import imghdr
import httplib
import cStringIO

conn = httplib.HTTPConnection('www.ovguide.com', timeout=60)
path = '/img/global/ovg_logo.png'
conn.request('GET', path)
r1 = conn.getresponse()

image_file_obj = cStringIO.StringIO(r1.read())
what_type = imghdr.what(image_file_obj)

print what_type

This should return 'png'. If it is not an image it will return None

Hope that helps!

-Blake

answered May 11 '12 at 00:51

Blake Visin

138
1
5

If you absolutely want to be sure its an image, this is the way to go, but it comes at a cost of retrieving the whole image file first – MattoTodd May 11 '12 at 00:53

check if a URL to an image is up and exists in Python

4 Answers4

Linked