This is a question on web scraping. I am able to scrape sites using BeautifulSoup but I want to use XPaths because of the "Copy Xpath" function that Chrome that makes it super easy. My understanding is that Xpath is easier because to use BeautifulSoup we need HTML id that one needs to manually generate.
For example, following is a title I am getting but have to generate the 'find' part manually. If it was Xpath, my understanding is that I could just do "Copy XPath" from Chrome 'Inspect Element' window.
import requests
from bs4 import BeautifulSoup
url = "http://www.indeed.com/jobs?q=hardware+engineer&l=San+Francisco%2C+CA"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
job_titles = soup.find_all("h2", {"class", "jobtitle"})
jobs_sponsored = soup.find_all("div", {"data-tn-component", "sponsoredJob"})
for title in job_titles:
print title.text.strip()
print "SPONSORED JOB LISTINGS"
print "\n"
for sponsored in jobs_sponsored:
print sponsored.text.strip()
What would the equivalent code using XPaths look like? I am not able to find the library / syntax on how to extract content using Xpath instead of html ids.
EDIT: The quesion is NOT whether I can use Xpath with BeautifulSoup (I already know I cannot). The question is what would some or all of the statements above look like if I wanted to use XPath? What package (I dont have to use BeautifulSoup) do I need to use?