I am trying to use beautifulsoup4 to parse a series of webpages written in XHTML. I am assuming that for best results, I should pair with an xml parser, and the only one supported by beautifulsoup to my knowledge is lxml.
However, when I try to run the following as per the beautifuloup documentation:
import requests
from bs4 import BeautifulSoup
r = requests.get(‘hereiswhereiputmyurl’)
soup = BeautifulSoup(r.content, ‘xml’)
it results in the following error:
FeatureNotFound: Couldn't find a tree builder with the features you
requested: xml. Do you need to install a parser library?
Its driving me crazy. I have found record of two other users who posted the same problem
I used this post (see link directly below this line) to reinstall and update lxml and also updated beautiful soup, but I am still getting the error. Installing lxml, libxml2, libxslt on Windows 8.1
Beautifulsoup is working otherwise because I ran the following code and it presented me with its usual wall of markup language soup = BeautifulSoup(r.content, ‘html.parser’)
Here are my specs Windows 8.1 Python 3.5.2 I use the spyder ide in Anaconda 3 to run my code (which admittedly, I do not know much about)
I'm sure its a messup that a beginner would do because as I stated before I have very little programming experience.
How can i resolve this issue, or if it is a known bug, would you guys recommend that I just use lxml by itself to scrape the data.