I am writing some syntax to parse website and get all the href there. However, when I try to import bs4, it pops out an error saying "ImportError: cannot import name 'HTMLParseError'. I am using Python 3.5.2.
I take the past reference and know that it may be due to the old version of bs4 and hence has upgraded that to version 4.5.1. However, the error still exists. Is that something wrong with my syntax (I attached below, which is also from past reference). Or I have to seek another tool for doing the task?
Could anyone has any idea? One more thing, I also try to install lxml (it said unable to find vcvarsall.bat) but failed too. So, not many tools I can use.
from bs4 import BeautifulSoup
import urllib.request
def open_html():
resp = urllib.request.urlopen("http://www.gpsbasecamp.com/national-parks")
soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset'))
for link in soup.find_all('a', href=True):
print(link['href'])
if __name__ == '__main__':
open_html()