1

I am using selenium to scrape the amazon search results page. As I was wrapping it up, I moved my scraping to headless mode as it will save on efficiency. However in headless mode, certain page elements do not become available such as sponsored brand. It works perfectly fine when using non-headless mode, but fails using headless even after setting the following options:

options = Options()
#options.headless = True
options.add_argument("--window-size=1920,1080")
options.add_argument("--disable-extensions")
options.add_argument("--proxy-server='direct://'")
options.add_argument("--proxy-bypass-list=*")
options.add_argument("--start-maximized")
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--no-sandbox')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--allow-running-insecure-content')
driver = webdriver.Chrome(options=options)

PS: I tried with and without the commented section as well as with just the commented section.

For clarification purposes I screenshotted each example: this is what it looks like when it run it in headless mode and this is what it normally looks like (without headless mode as well as normal user browsing). I am wondering what else needs to be added in order for the sponsored brand information to show up when I run it on headless mode. I am thinking it may be a problem with JavaScript not communicating properly with the browser?

As always, thank you in advance!!

double-beep
  • 5,031
  • 17
  • 33
  • 41
Luke Hamilton
  • 637
  • 5
  • 19
  • The Javascript interpreter is PART of the browser. There aren't any communications problems. It's possible (just theorizing here) that headless mode sets a different value for the CSS `media` attribute, and the branding might depend on that. – Tim Roberts Nov 18 '21 at 18:00
  • @TimRoberts That's a great question. I did some research, there does not seem to be a chrome option that can change (or restore) the CSS media attribute. Any idea on how to (in layman's terms) import the same CSS media attributes that appears in the non-headless browser to the headless browser? – Luke Hamilton Nov 18 '21 at 19:00
  • 1
    I don't see any significant difference between the two snapshots – undetected Selenium Nov 18 '21 at 19:32
  • 1
    @DebanjanB My apologies!! I just noticed that mix up. Please refer back to it again and let me know your thoughts. Thanks!! – Luke Hamilton Nov 18 '21 at 19:45
  • @DebanjanB my apologies! I am rather new to contributing to this platform. How would you recommend I go about being in a mode for you to answer if I already found the answer to my problem? I want to be as beneficial to learning as I can. – Luke Hamilton Nov 18 '21 at 20:24
  • @LukeHamilton I know you are new :) keep asking good questions – undetected Selenium Nov 18 '21 at 20:28

1 Answers1

2

Using the latest Google Chrome v95.0

  • When you use the normal headed browser the following is in use:

    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36
    
  • Where as when you use the browser the following is in use:

    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/95.0.4638.69 Safari/537.36
    

The presence of the additional Headless parameter/attribute is intercepted as a . Hence you see the difference.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352