python: check if url to jpg exists

Question

In python, how would I check if a url ending in .jpg exists?

ex: http://www.fakedomain.com/fakeImage.jpg

thanks

Please give details by editing the question description to address these points: What qualifies as "exists"? How does it differ from "an HTTP GET request to that URL succeeds"? — bignose, Mar 21 '10 at 08:39

score 53 · Answer 1 · edited May 23 '17 at 11:47

53

The code below is equivalent to tikiboy's answer, but using a high-level and easy-to-use requests library.

import requests

def exists(path):
    r = requests.head(path)
    return r.status_code == requests.codes.ok

print exists('http://www.fakedomain.com/fakeImage.jpg')

The requests.codes.ok equals 200, so you can substitute the exact status code if you wish.

requests.head may throw an exception if server doesn't respond, so you might want to add a try-except construct.

Also if you want to include codes 301 and 302, consider code 303 too, especially if you dereference URIs that denote resources in Linked Data. A URI may represent a person, but you can't download a person, so the server will redirect you to a page that describes this person using 303 redirect.

edited May 23 '17 at 11:47

Community

1
1

answered Oct 25 '13 at 06:26

Mirzhan Irkegulov

17,660
12
105
166

This answer looks the simplest and the most normal way to do this now. See http://stackoverflow.com/questions/2018026/should-i-use-urllib-or-urllib2-or-requests – brita_ Apr 29 '14 at 14:28
Works perfectly in Python 3.5 opposite to other answers. – Eskapp Feb 04 '17 at 20:28
`[ 301, 302, 303, 307, 308, 200 ]` should be the correct codes to look for according to [Reference](https://stackoverflow.com/a/42138726/3967709) – Gokul May 28 '18 at 18:04

score 35 · Accepted Answer · edited Aug 11 '11 at 17:57

35

>>> import httplib
>>>
>>> def exists(site, path):
...     conn = httplib.HTTPConnection(site)
...     conn.request('HEAD', path)
...     response = conn.getresponse()
...     conn.close()
...     return response.status == 200
...
>>> exists('http://www.fakedomain.com', '/fakeImage.jpg')
False

If the status is anything other than a 200, the resource doesn't exist at the URL. This doesn't mean that it's gone altogether. If the server returns a 301 or 302, this means that the resource still exists, but at a different URL. To alter the function to handle this case, the status check line just needs to be changed to return response.status in (200, 301, 302).

edited Aug 11 '11 at 17:57

orokusaki

55,146
59
179
257

answered Mar 21 '10 at 08:27

tikiboy

416
3
4

2

+1, although I'd imagine using `HEAD` instead of `GET` in the call to `conn.request` would be more efficient, since you're only checking to see if it exists. – Daniel Roseman Mar 21 '10 at 10:58
@Daniel, thanks for that tip. I've updated the code to use HEAD. – tikiboy Mar 21 '10 at 17:48
If you are seeing errors similar to: **"gaierror: [Errno 8] nodename nor servname provided, or not known"** make sure that your 'site' value does not include `http://`, `ftp://`, etc. Instead it seems that httplib will attempt to derive the correct protocol or requires the appropriate port number to be specified (see additional comment below). – bluebinary Aug 20 '13 at 19:44
1

Furthermore, if you get the error **"InvalidURL: nonnumeric port: '//www.fakedomain.com'"**, make sure you add the appropriate port number to your 'site' URL. In my case, this meant changing `http://www.fakedomain.com` to `www.fakedomain.com:80` which solved this issue. Indeed in reviewing the documentation for httplib on python.org, I noticed that the examples listed exclude the protocol definition from the URL: http://docs.python.org/2/library/httplib.html – bluebinary Aug 20 '13 at 19:46
check this: http://stackoverflow.com/questions/2018026/should-i-use-urllib-or-urllib2-or-requests for a comparison of the different libs that could be used for this. Requests seems to be the most popular. – brita_ Apr 29 '14 at 14:07
does it work for this url http://www.hdwallpapers4ipad.com/_ph/13/426699792.jpg I have found they can add as a .jpg and not actually be an image. – Shane May 12 '14 at 20:14
In some cases you can get a 405 (method not allowed). In this case, you may use a GET as new try. – Save Aug 27 '20 at 11:05

score 7 · Answer 3 · answered Mar 22 '10 at 01:33

7

thanks for all the responses everyone, ended up using the following:

try:
  f = urllib2.urlopen(urllib2.Request(url))
  deadLinkFound = False
except:
  deadLinkFound = True

answered Mar 22 '10 at 01:33

user257543

881
1
14
35

Short n' sweet. I used this myself as my URL string(s) (about 5000 of them) were the full URI --I didn't want to get too detailed. I was also able to assume that i'd receive a 404 and not a redirect. Not sure it this would work with a redirect. – Ben Keating Feb 01 '11 at 22:42
1

Well, will give True on URL errors also and even on 301,302,303 errors also. – Yugal Jindle Aug 23 '11 at 08:52

score 4 · Answer 4 · answered Mar 29 '13 at 04:38

4

There are problems with the previous answers when the file is in ftp server (ftp://url.com/file), the following code works when the file is in ftp, http or https:

import urllib2

def file_exists(url):
    request = urllib2.Request(url)
    request.get_method = lambda : 'HEAD'
    try:
        response = urllib2.urlopen(request)
        return True
    except:
        return False

answered Mar 29 '13 at 04:38

XavierCLL

1,163
10
12

I couldn't get any of the previous answers to return False when I entered a bad file URL, but this answer worked great! – Darkhydro Jan 21 '14 at 22:57
Is there a way to do same thing with urllib3? – MehmedB Jul 12 '19 at 14:38
not exactly like this, for urllib3 requires some changes – XavierCLL Jul 13 '19 at 15:11

score 4 · Answer 5 · edited Mar 08 '12 at 14:31

4

Looks like http://www.fakedomain.com/fakeImage.jpg automatically redirected to http://www.fakedomain.com/index.html without any error.

Redirecting for 301 and 302 responses are automatically done without giving any response back to user.

Please take a look HTTPRedirectHandler, you might need to subclass it to handle that.

Here is the one sample from Dive Into Python:

http://diveintopython3.ep.io/http-web-services.html#redirects

edited Mar 08 '12 at 14:31

Bill the Lizard

398,270
210
566
880

answered Mar 21 '10 at 06:32

YOU

120,166
34
186
219

3

I think fakedomain.com is used for example as named and actually you needn't to visit it yourself.:-) – Young Mar 21 '10 at 07:10
1

@SpawnCxy, At first I thought like that, but when I go to that url, fakeImage.jpg does not exist and its redirected to index.html, so I am assuming its more than an example. – YOU Mar 21 '10 at 07:31

score 2 · Answer 6 · answered Mar 21 '10 at 13:22

2

Try it with mechanize:

import mechanize
br = mechanize.Browser()
br.set_handle_redirect(False)
try:
 br.open_novisit('http://www.fakedomain.com/fakeImage.jpg')
 print 'OK'
except:
 print 'KO'

answered Mar 21 '10 at 13:22

systempuntoout

71,966
47
171
241

https://kite.com/python/docs/mechanize.Browser.open_novisit says, it doesn't send response - It has to send a response right ? – Areza Feb 12 '20 at 22:08

score 1 · Answer 7 · answered Nov 08 '16 at 22:08

1

This might be good enough to see if a url to a file exists.

import urllib
if urllib.urlopen('http://www.fakedomain.com/fakeImage.jpg').code == 200:
  print 'File exists'

answered Nov 08 '16 at 22:08

z3moon

131
6

score 0 · Answer 8 · answered Sep 05 '18 at 11:52

in Python 3.6.5:

import http.client

def exists(site, path):
    connection =  http.client.HTTPConnection(site)
    connection.request('HEAD', path)
    response = connection.getresponse()
    connection.close()
    return response.status == 200

exists("www.fakedomain.com", "/fakeImage.jpg")

In Python 3, the module httplib has been renamed to http.client

And you need remove the http:// and https:// from your URL, because the httplib is considering : as a port number and the port number must be numeric.

score 0 · Answer 9 · answered Feb 27 '20 at 06:13

Python3

import requests

def url_exists(url):
    """Check if resource exist?"""
    if not url:
        raise ValueError("url is required")
    try:
        resp = requests.head(url)
        return True if resp.status_code == 200 else False
    except Exception as e:
        return False

score 0 · Answer 10 · answered Dec 17 '22 at 13:22

The answer of @z3moon was good, but I think it is for py 2.x. For python 3.x, you may want to add request to the module call.

import urllib
def check_valid_URLs(url) -> bool:
  try:
    if urllib.request.urlopen(url).code == 200:
      return True
    else:
      return False
  except:
    return False

score -1 · Answer 11 · answered Mar 21 '10 at 06:17

-1

I think you can try send a http request to the url and read the response.If no exception was caught,it probably exists.

answered Mar 21 '10 at 06:17

Young

7,986
7
43
64

that's what I tried doing but I couldn't find any specific code samples. Would you happen to have one? – user257543 Mar 21 '10 at 06:33

python: check if url to jpg exists

11 Answers11

Linked

Related