2

So, I learned how Web Scraping works a few days ago and I was messing around today. I wanted to know how I could test if a page exists/doesn't exist. So, I looked it up and I found Python check if website exists. I'm using the requests module and I got this code from the answers:

import requests
request = requests.get('http://www.example.com')
if request.status_code == 200:
    print('Web site exists')
else:
    print('Web site does not exist') 

I tried it out, and since example.com exists, it printed "Web site exists". However, I tried something I was sure wouldn't exist, like examplewwwwwww.com and it gave me this error. Why is it doing this and how can I keep it from printing out an error (and instead saying that the website does not exist)?

MicroWither
  • 49
  • 1
  • 5
  • As that page indicates, it throws a ConnectionError https://stackoverflow.com/questions/16778435/python-check-if-website-exists#comment70165050_16778473 – Josh Lee Feb 07 '18 at 14:10
  • There's no server there to give you a status. Read the comments of that link you posted and instead use something like `try... except ConnectionError`. – Matt Hall Feb 07 '18 at 14:14
  • some sites block you thinking this is a scraping attempt, knowing you're not a real browser due to your user agent an other features. This explains why some urls rejected with 404 actually DO work in the browser – JasonGenX Feb 06 '21 at 17:10

4 Answers4

5

You can use try/except like this:

import requests
from requests.exceptions import ConnectionError

try:
    request = requests.get('http://www.example.com')
except ConnectionError:
    print('Web site does not exist')
else:
    print('Web site exists')
Alex K.
  • 835
  • 6
  • 15
1

Just to list my way of doing it, maybe it can be of value for someone:

  try:
     response = requests.get('https://github.com')
     if response.ok:
        ready = 1
        break
  except requests.exceptions.RequestException:
     print("Website not availabe...")
Sonia
  • 362
  • 3
  • 6
0

You have to enclose request.get call with try/except and handle various exceptions that might arise, one of which is ConnectionError.

You get this because having response status_code not equal to 200 and not being able to connect to desired HTTP address are two different things.

Here are the exceptions that you might encounter when making requests with requests library.

Ilija
  • 1,556
  • 1
  • 9
  • 12
0

Well you getting the error because the url you want to get is invalid, however you can easily check this with a try - except block as this one:

import requests
from requests.exceptions import MissingSchema

try:
    request = requests.get('examplewwwwwww.com')
except MissingSchema:
    print('The provided URL is invalid.')
Szabolcs
  • 3,990
  • 18
  • 38