I have the following problem:
I would like to parse html files and get links from the html file. I can get links with the following code:
class MyHTMLParser(HTMLParser):
links=[]
def __init__(self,url):
HTMLParser.__init__(self)
self.url = url
def handle_starttag(self, tag, attrs):
try:
if tag == 'a':
for name, value in attrs:
if name == 'href':
if value[:5]=="http:":
self.links.append(value)
except:
pass
But I dont want to get audio files, video files, etc. I only want to get html links. How can I do that?