Bottom line up front: I want to scrape the jobs from this website: https://www.gdit.com/careers/search/?q=bossier%20city, but I keep getting the javascript base page. If you inspect the page, you can see the jobs are listed in h3 tags but no matter what I do, the jobs don't pull up.
- I tried the following beautiful soup code:
url = "https://www.gdit.com/careers/search/?q=bossier%20city"
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, "html.parser")
print(soup) # for testing purposes
for job in soup.find_all('h3'):
print(job)
- I tried ScraperAPI which I thought was supposed to load javascript for you:
url = "https://www.gdit.com/careers/search/?q=bossier%20city"
params = {'api_key': "MY-API-KEY", 'url': url}
response = requests.get('http://api.scraperapi.com/', params=params)
print(response.text) # No H3 tags of any kind
- I tried html-requests:
session = HTMLSession()
r = session.get("https://www.gdit.com/careers/search/?q=bossier%20city")
data = r.html.render()
print(data)
- I tried Selenium first and then parsing it to beautifulsoup:
global driver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("detach", True)
options.add_experimental_option('useAutomationExtension', False)
try:
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Users\Notebook\Documents\chromedriver.exe')
driver.get(url)
page_source = driver.page_source
soup = BeautifulSoup(page_source, "html.parser")
time.sleep(2)
print(soup)
except exceptions.WebDriverException:
print("You need to download a new version of the Chromedriver.")
Nothing works. Do I have to mimic a user entering Bossier City first and then retrieve the return? Anyways, any help would be appreciated.