I am writing a script to only extract the hyperlinks from a webpage. This is what I have so far:
import bs4 as bs
import urllib.request
source = urllib.request.urlopen('http://www.soc.napier.ac.uk/~40009856/CW/').read()
soup = bs.BeautifulSoup(source, 'lxml')
#for paragraph in soup.find_all('p'):
# print(paragraph.string)
for url in soup.find_all('a'):
print(url.get('href'))
I want only hyperlinks to other webpages and not links to PDFs and email addresses as well. As is given in the output
How do I specify to only return hyperlinks?