I used an package in python named "pdfminer" to convert pdf file to html file. I want to scrape useful information on the pdf file. How could I use xpath and beautiful on any html file. I know how to use xpath and beautiful soup on the webpage given links like this:
# get tree
def get_tree(url):
r = requests.get(url)
tree = html.fromstring(r.content)
return tree
# get soup
def get_soup(url):
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
return soup
Could anyone give me some example on how to use xpath and beautiful soup if only html file is given? Thanks