1

I started to write web scraping code in Python recently. I was able to do a lot just using requests and BeautifulSoup. Then when I tried Staples website, I can't find the element that shown in the Developer Tools in Chrome. I did a little research and thought it might be JavaScript. I tried ghost.py and QtWebKit, and they have this SSL issues. Then I tried Selenium + PhantomJS.

executable_path = os.path.dirname(os.path.abspath(__file__)) + '\\phantomjs.exe'
browser = webdriver.PhantomJS(executable_path=executable_path)
browser.get(url)
html = browser.page_source
browser.save_screenshot('./abc.png')

The screenshot1 is different than the screenshot2 from Chrome. In Chrome, there is a price block which is no shown in the PhantomJS browser. I also tried a customized header, there is no difference.

headers = { 'Accept':'*/*',
'Accept-Encoding':'gzip, deflate, sdch',
'Accept-Language':'en-US,en;q=0.8',
'Cache-Control':'max-age=0',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, 
like Gecko) Chrome/48.0.2564.116 Safari/537.36'}

for key, value in enumerate(headers):
    capability_key = 'phantomjs.page.customHeaders.{}'.format(key)
    webdriver.DesiredCapabilities.PHANTOMJS[capability_key] = value

I want to scrape the price from the webpage. Is there some setting for Selenium I can use to get the same webpage as a regular browser?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Zhenyu He
  • 21
  • 3

1 Answers1

1

I don't see any issue as such in your code block. I have taken your code block and executed the same Test capturing the snapshots in default and maximized modes and here is the result with the screenshot with the URL http://www.google.com.


PhantomJS (Default Viewport) :

When we initiate PhantomJS in default configuration, as PhantomJS is headless (nothing is shown), viewportSize {object} property effectively simulates the size of the window like in a traditional browser. Hence the initial page loaded is portrait sized as follows :

  • Minimal Code :

    browser = webdriver.PhantomJS(executable_path=r'C:\\Utility\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe')
    browser.get("http://www.google.com")
    html = browser.page_source
    browser.save_screenshot('./Screenshots/PhantomJS_normal.png')
    browser.quit()
    
  • Snapshot :

PhantomJS_normal


PhantomJS (Maximized Viewport) :

But when we initiate PhantomJS in default configuration and then simultaneously invoke the maximize_window() method the viewportSize effectively simulates the size of the entire screen as follows : - Minimal Code :

    browser = webdriver.PhantomJS(executable_path=r'C:\\Utility\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe')
    browser.get("http://www.google.com")
    browser.maximize_window()
    html = browser.page_source
    browser.save_screenshot('./Screenshots/PhantomJS_maximize.png')
    browser.quit()
  • Snapshot :

PhantomJS_maximize


Conclusion

So it is pretty clear from the above two pictures that the default initiation of PhantomJS browser have a lesser Viewport while if we invoke maximize_window() the Viewport is enlarged. Hence we can interact with more elements. So it is expected that to get majority of the page elements visible within the Viewport you have to maximize the browser.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • I tried. This is not helping. I actually use **bold font** to emphasize what is different from 2 screenshot. Please read the questions again and try this [url](https://www.staples.com/dell-i5567-7526gry-15-6-laptop-computer-intel-i7-256gb-ssd-8gb-ddr4-win-10-intel-hd-graphics-620/product_2677757) – Zhenyu He Jan 10 '18 at 00:59