I would like to make python script that will randomly access hyperlinks on some website on every 1 second.
Started with validating url:
def valid_url(url):
try:
urllib2.urlopen(url)
return True
except Exception, e:
return False
print valid_url('www.python.org')
- I can get hyperlink using re
import urllib2 import re url = 'http://www.python.org/' page = urllib2.urlopen(url) page = page.read() links = re.findall(r"<a.*?\s*href=\"(.*?)\".*?>(.*?)</a>", page) for link in links: print('href: %s, HTML text: %s' % (link[0], link[1]))