0

I would like to make python script that will randomly access hyperlinks on some website on every 1 second.

Started with validating url:

def valid_url(url):
    try:
        urllib2.urlopen(url)
        return True
    except Exception, e:
        return False

print valid_url('www.python.org')
  1. I can get hyperlink using re
import urllib2
import re
url = 'http://www.python.org/'
page = urllib2.urlopen(url)
page = page.read()
links = re.findall(r"<a.*?\s*href=\"(.*?)\".*?>(.*?)</a>", page)
for link in links:
    print('href: %s, HTML text: %s' % (link[0], link[1]))
aadlv
  • 21
  • 5

2 Answers2

0

This will work:

print valid_url('http://www.python.org')

You can see how to handle it here.

If you want to access random hyperlinks you will have to parse the page in order to collect the urls, use random choise every second (with a loop that uses time.sleep(1)) and use the urlopen to access.

If you'll give more information, I will able to assist you better.

Community
  • 1
  • 1
Acsisr
  • 186
  • 1
  • 15
0

So.. This is the script i wanted:

import urllib2
import re
from random import randrange
import time

url = 'http://some web site...'
page = urllib2.urlopen(url)
page = page.read()
links = re.findall(r"<a.*?\s*href=\"(.*?)\".*?>(.*?)</a>", page)

while True:
    i = randrange(len(links))
    if not links[i][0].startswith('http'):
        n = urllib2.urlopen(url + links[i][0])
        open_url = n.read
        close_url = n.close
        # n.geturl()
        print 'Opened ' + url + links[i][0]
        time.sleep(5)
aadlv
  • 21
  • 5