I have here a python web scraping tool script, I need to validate the url if its an existing website by testing connectivity to the website. Can anyone help me to implement this in my code?
Here's my code:
import sys, urllib
while True:
try:
url= raw_input('Please input address: ')
webpage=urllib.urlopen(url)
print 'Web address is valid'
break
except:
print 'No input or wrong url format usage: http://wwww.domainname.com/ '
print 'Please try again'
def wget(webpage):
print '[*] Fetching webpage...\n'
page = webpage.read()
return page
def main():
sys.argv.append(webpage)
if len(sys.argv) != 2:
print '[-] Usage: webpage_get URL'
return
print wget(sys.argv[1])
if __name__ == '__main__':
main()
EDIT: I have a code here that I extracted from another stackoverflow post. This code works and I just want it to integrate to my code. I have tried to integrate myself but get errors instead. Can anyone help me do this? Here's the code:
from urllib2 import Request, urlopen, URLError
req = Request('http://jfvbhsjdfvbs.com')
try:
response = urlopen(req)
except URLError, e:
if hasattr(e, 'reason'):
print 'We failed to reach a server.'
print 'Reason: ', e.reason
elif hasattr(e, 'code'):
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
else:
print 'URL is good!'