Check the URL input is active in python script

Question

I have here a python web scraping tool script, I need to validate the url if its an existing website by testing connectivity to the website. Can anyone help me to implement this in my code?

Here's my code:

import sys, urllib

while True:
    try:
        url= raw_input('Please input address: ')
        webpage=urllib.urlopen(url)
        print 'Web address is valid'
        break
    except:
        print 'No input or wrong url format usage: http://wwww.domainname.com/ '
        print 'Please try again'
def wget(webpage):
        print '[*] Fetching webpage...\n'
        page = webpage.read()
        return page      
def main():
    sys.argv.append(webpage)
    if len(sys.argv) != 2:
        print '[-] Usage: webpage_get URL'
        return
    print wget(sys.argv[1])

if __name__ == '__main__':
    main()

EDIT: I have a code here that I extracted from another stackoverflow post. This code works and I just want it to integrate to my code. I have tried to integrate myself but get errors instead. Can anyone help me do this? Here's the code:

from urllib2 import Request, urlopen, URLError
req = Request('http://jfvbhsjdfvbs.com')
try:
    response = urlopen(req)
except URLError, e:
    if hasattr(e, 'reason'):
        print 'We failed to reach a server.'
        print 'Reason: ', e.reason
    elif hasattr(e, 'code'):
        print 'The server couldn\'t fulfill the request.'
        print 'Error code: ', e.code
else:
    print 'URL is good!'

Looks nice, only that your `while True` is executed before you call main. — Hyperboreus, Dec 10 '13 at 17:22
I'd rather check the response code, look at [this](http://stackoverflow.com/questions/1140661/python-get-http-response-code-from-a-url) post — Jan Vorcak, Dec 10 '13 at 17:25
yes that's what I need but i dont know how to implement it in my code. So im asking for help if anyone can help me do this — user3034404, Dec 10 '13 at 17:29
@user3034404 A python script is execute top to bottom, in your case 1. your `while` with its suite, then two `defs` (adding the functions to the scope) and then the condition which maybe invokes `main`. By this order, your `while` is executed first and your `main` last in case the condition holds. — Hyperboreus, Dec 10 '13 at 17:58
ahh yes, because it needs to check if the user input is valid e.g. if its in the correct format or no user input so it loops until it hits the right URL. However, it takes any url like http://www.domain.com/ because it is a correct format. I want to add another test to check the connectivity of the url. — user3034404, Dec 10 '13 at 18:38

score 1 · Answer 1 · answered Dec 10 '13 at 18:01

1

Maybe this snippet helps you to understand why your main is executed after the while:

print 'Checkpoint Alpha'

while True:
    print 'Checkpoint Bravo'
    if raw_input ('x for break: ') == 'x': break

print 'Checkpoint Charlie'

def main():
    print 'Checkpoint Foxtrott'

print 'Checkpoint Delta'

if __name__ == '__main__':
    print 'Checkpoint Echo'
    main()
    print 'Checkpoint Golf'

print 'Checkpoint Hotel'

answered Dec 10 '13 at 18:01

Hyperboreus

31,997
9
47
87

1

@KDawG You can take the officer out of the Air Force, but you can't take the Air Force out of the officer. Tally Ho! – Hyperboreus Dec 10 '13 at 22:54

score 0 · Answer 2 · answered Dec 10 '13 at 17:30

0

Following should help you -

visited = []

in while loop - 
in try:
    url= raw_input('Please input address: ')
    if url in visited: 
        print "Already visited. Continue"
    visited.append(url)
    webpage=urllib.urlopen(url)
    [...]

answered Dec 10 '13 at 17:30

Arovit

3,579
5
20
24

I dont think this is what I need. I need a code that will check the connectivity to the given URL by the user – user3034404 Dec 10 '13 at 17:34

Check the URL input is active in python script

2 Answers2