0

I'm trying to create a program that checks a URL and returns whether the URL leads to a valid web page

I've already made such a program in c# using classes MyClient and WebClient. I can't seem to find a functioning alternative for Python and since I'm very inexperienced in the language am struggling to come up with anything myself.

import os
hostname = ("google.co.uk")
print (hostname)
response = os.system("ping -n 10 " + hostname)
print (response)
if response == 0:
  print (hostname, 'is up!')
else:
  print (hostname, 'is down!')

This program is close to what I'm wanting yet all it checks is whether a site is up, as far as I know it can't check specific pages.

Are there any libraries that have the functionality I'm looking for or is there a way to adapt my current code?

H. Siddons
  • 33
  • 1
  • 1
  • 5
  • Possible duplicate of [Pinging servers in Python](https://stackoverflow.com/questions/2953462/pinging-servers-in-python) – Olvin Roght Jun 12 '19 at 19:22
  • That post seems to be asking whether a server is active, I'm wanting to test whether a specific URL exists, not the domain – H. Siddons Jun 12 '19 at 19:24
  • You should not confuse a ping (ICMP types 0/8) with an HTTP(S) connection (TCP port 80/443). There are many servers delivering a proper web page while rejecting pings and server answering a ping while not having a webserver at all. To validate a URL you have to make an HTTP request. – Klaus D. Jun 12 '19 at 19:26
  • Not enough infromation. HTTP protocol has lot of different response code which can be recognized as marker of url exists on server. – Olvin Roght Jun 12 '19 at 19:27

1 Answers1

0

You can use requests for that.

import requests

r = requests.get("url.page")
if r.status_code == 200:
   print("Valid page")
Igor Servulo
  • 371
  • 1
  • 9
  • That are many http codes that represents a not valid page (501, 500, 404, 403). The most assertive way is to check for a 200. – Igor Servulo Jun 12 '19 at 19:26
  • But there're lot of codes which can be accepted as confirmation of url path exist. – Olvin Roght Jun 12 '19 at 19:28
  • Since the author wanted a valid page, we're not checking for redirection url's like 302. I agree with you that are another http codes that can represet a valid response, but not necessary a valid page. So the best code to compare for a simple implementation is 200. – Igor Servulo Jun 12 '19 at 19:31
  • 1
    Not really, some of my projects return 3xx on requests to some paths without cookies. There's no standard. I don't want to say that your method is wrong, just notice that it is not a 100% universal solution. – Olvin Roght Jun 12 '19 at 19:33
  • I agree with you. – Igor Servulo Jun 12 '19 at 19:36