37

In python, how would I check if a url ending in .jpg exists?

ex: http://www.fakedomain.com/fakeImage.jpg

thanks

user257543
  • 881
  • 1
  • 14
  • 35
  • 1
    Please give details by editing the question description to address these points: What qualifies as "exists"? How does it differ from "an HTTP GET request to that URL succeeds"? – bignose Mar 21 '10 at 08:39

11 Answers11

53

The code below is equivalent to tikiboy's answer, but using a high-level and easy-to-use requests library.

import requests

def exists(path):
    r = requests.head(path)
    return r.status_code == requests.codes.ok

print exists('http://www.fakedomain.com/fakeImage.jpg')

The requests.codes.ok equals 200, so you can substitute the exact status code if you wish.

requests.head may throw an exception if server doesn't respond, so you might want to add a try-except construct.

Also if you want to include codes 301 and 302, consider code 303 too, especially if you dereference URIs that denote resources in Linked Data. A URI may represent a person, but you can't download a person, so the server will redirect you to a page that describes this person using 303 redirect.

Community
  • 1
  • 1
Mirzhan Irkegulov
  • 17,660
  • 12
  • 105
  • 166
  • This answer looks the simplest and the most normal way to do this now. See http://stackoverflow.com/questions/2018026/should-i-use-urllib-or-urllib2-or-requests – brita_ Apr 29 '14 at 14:28
  • Works perfectly in Python 3.5 opposite to other answers. – Eskapp Feb 04 '17 at 20:28
  • `[ 301, 302, 303, 307, 308, 200 ]` should be the correct codes to look for according to [Reference](https://stackoverflow.com/a/42138726/3967709) – Gokul May 28 '18 at 18:04
35
>>> import httplib
>>>
>>> def exists(site, path):
...     conn = httplib.HTTPConnection(site)
...     conn.request('HEAD', path)
...     response = conn.getresponse()
...     conn.close()
...     return response.status == 200
...
>>> exists('http://www.fakedomain.com', '/fakeImage.jpg')
False

If the status is anything other than a 200, the resource doesn't exist at the URL. This doesn't mean that it's gone altogether. If the server returns a 301 or 302, this means that the resource still exists, but at a different URL. To alter the function to handle this case, the status check line just needs to be changed to return response.status in (200, 301, 302).

orokusaki
  • 55,146
  • 59
  • 179
  • 257
tikiboy
  • 416
  • 3
  • 4
  • 2
    +1, although I'd imagine using `HEAD` instead of `GET` in the call to `conn.request` would be more efficient, since you're only checking to see if it exists. – Daniel Roseman Mar 21 '10 at 10:58
  • @Daniel, thanks for that tip. I've updated the code to use HEAD. – tikiboy Mar 21 '10 at 17:48
  • If you are seeing errors similar to: **"gaierror: [Errno 8] nodename nor servname provided, or not known"** make sure that your 'site' value does not include `http://`, `ftp://`, etc. Instead it seems that httplib will attempt to derive the correct protocol or requires the appropriate port number to be specified (see additional comment below). – bluebinary Aug 20 '13 at 19:44
  • 1
    Furthermore, if you get the error **"InvalidURL: nonnumeric port: '//www.fakedomain.com'"**, make sure you add the appropriate port number to your 'site' URL. In my case, this meant changing `http://www.fakedomain.com` to `www.fakedomain.com:80` which solved this issue. Indeed in reviewing the documentation for httplib on python.org, I noticed that the examples listed exclude the protocol definition from the URL: http://docs.python.org/2/library/httplib.html – bluebinary Aug 20 '13 at 19:46
  • check this: http://stackoverflow.com/questions/2018026/should-i-use-urllib-or-urllib2-or-requests for a comparison of the different libs that could be used for this. Requests seems to be the most popular. – brita_ Apr 29 '14 at 14:07
  • does it work for this url http://www.hdwallpapers4ipad.com/_ph/13/426699792.jpg I have found they can add as a .jpg and not actually be an image. – Shane May 12 '14 at 20:14
  • In some cases you can get a 405 (method not allowed). In this case, you may use a GET as new try. – Save Aug 27 '20 at 11:05
7

thanks for all the responses everyone, ended up using the following:

try:
  f = urllib2.urlopen(urllib2.Request(url))
  deadLinkFound = False
except:
  deadLinkFound = True
user257543
  • 881
  • 1
  • 14
  • 35
  • Short n' sweet. I used this myself as my URL string(s) (about 5000 of them) were the full URI --I didn't want to get too detailed. I was also able to assume that i'd receive a 404 and not a redirect. Not sure it this would work with a redirect. – Ben Keating Feb 01 '11 at 22:42
  • 1
    Well, will give True on URL errors also and even on 301,302,303 errors also. – Yugal Jindle Aug 23 '11 at 08:52
4

There are problems with the previous answers when the file is in ftp server (ftp://url.com/file), the following code works when the file is in ftp, http or https:

import urllib2

def file_exists(url):
    request = urllib2.Request(url)
    request.get_method = lambda : 'HEAD'
    try:
        response = urllib2.urlopen(request)
        return True
    except:
        return False
XavierCLL
  • 1,163
  • 10
  • 12
4

Looks like http://www.fakedomain.com/fakeImage.jpg automatically redirected to http://www.fakedomain.com/index.html without any error.

Redirecting for 301 and 302 responses are automatically done without giving any response back to user.

Please take a look HTTPRedirectHandler, you might need to subclass it to handle that.

Here is the one sample from Dive Into Python:

http://diveintopython3.ep.io/http-web-services.html#redirects

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
YOU
  • 120,166
  • 34
  • 186
  • 219
  • 3
    I think fakedomain.com is used for example as named and actually you needn't to visit it yourself.:-) – Young Mar 21 '10 at 07:10
  • 1
    @SpawnCxy, At first I thought like that, but when I go to that url, fakeImage.jpg does not exist and its redirected to index.html, so I am assuming its more than an example. – YOU Mar 21 '10 at 07:31
2

Try it with mechanize:

import mechanize
br = mechanize.Browser()
br.set_handle_redirect(False)
try:
 br.open_novisit('http://www.fakedomain.com/fakeImage.jpg')
 print 'OK'
except:
 print 'KO'
systempuntoout
  • 71,966
  • 47
  • 171
  • 241
  • https://kite.com/python/docs/mechanize.Browser.open_novisit says, it doesn't send response - It has to send a response right ? – Areza Feb 12 '20 at 22:08
1

This might be good enough to see if a url to a file exists.

import urllib
if urllib.urlopen('http://www.fakedomain.com/fakeImage.jpg').code == 200:
  print 'File exists'
z3moon
  • 131
  • 6
0

in Python 3.6.5:

import http.client

def exists(site, path):
    connection =  http.client.HTTPConnection(site)
    connection.request('HEAD', path)
    response = connection.getresponse()
    connection.close()
    return response.status == 200

exists("www.fakedomain.com", "/fakeImage.jpg")

In Python 3, the module httplib has been renamed to http.client

And you need remove the http:// and https:// from your URL, because the httplib is considering : as a port number and the port number must be numeric.

dengApro
  • 3,848
  • 2
  • 27
  • 41
0

Python3

import requests

def url_exists(url):
    """Check if resource exist?"""
    if not url:
        raise ValueError("url is required")
    try:
        resp = requests.head(url)
        return True if resp.status_code == 200 else False
    except Exception as e:
        return False
Anthony Awuley
  • 3,455
  • 30
  • 20
0

The answer of @z3moon was good, but I think it is for py 2.x. For python 3.x, you may want to add request to the module call.

import urllib
def check_valid_URLs(url) -> bool:
  try:
    if urllib.request.urlopen(url).code == 200:
      return True
    else:
      return False
  except:
    return False
Ahmed
  • 796
  • 1
  • 5
  • 16
-1

I think you can try send a http request to the url and read the response.If no exception was caught,it probably exists.

Young
  • 7,986
  • 7
  • 43
  • 64