I started to write web scraping code in Python recently. I was able to do a lot just using requests and BeautifulSoup. Then when I tried Staples website, I can't find the element that shown in the Developer Tools in Chrome. I did a little research and thought it might be JavaScript. I tried ghost.py and QtWebKit, and they have this SSL issues. Then I tried Selenium + PhantomJS.
executable_path = os.path.dirname(os.path.abspath(__file__)) + '\\phantomjs.exe'
browser = webdriver.PhantomJS(executable_path=executable_path)
browser.get(url)
html = browser.page_source
browser.save_screenshot('./abc.png')
The screenshot1 is different than the screenshot2 from Chrome. In Chrome, there is a price block which is no shown in the PhantomJS browser. I also tried a customized header, there is no difference.
headers = { 'Accept':'*/*',
'Accept-Encoding':'gzip, deflate, sdch',
'Accept-Language':'en-US,en;q=0.8',
'Cache-Control':'max-age=0',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/48.0.2564.116 Safari/537.36'}
for key, value in enumerate(headers):
capability_key = 'phantomjs.page.customHeaders.{}'.format(key)
webdriver.DesiredCapabilities.PHANTOMJS[capability_key] = value
I want to scrape the price from the webpage. Is there some setting for Selenium I can use to get the same webpage as a regular browser?