1

I'm trying to grab the entire html webpage of this website's forum. The comment section can only be loaded once you scroll down. After some scrolling, you will find that eventually (on page 4) a Load Next Page button appears that you must click to get the proceeding comments. After much searching, the code below works quite well to get the to final page of comments. Much of it is taken from this stackoverflow post and this one as well.

For reference, I am on Windows 10 and my Chrome driver version is 76.0.3809.132. I also used PhantomJS just to see which one would load quicker. Both driver .exe files are placed in the same directory as the one I'm executing the script from. I have not encountered any issues up until today.

import selenium.webdriver as webdriver
from selenium.webdriver.chrome.options import Options

def scrollDownAllTheWay(driver):
    last_height = driver.execute_script("return document.body.scrollHeight")

    while True:
        driver.execute_script("window.scrollTo(0, 100*document.body.scrollHeight);")

        time.sleep(3)

        if "Load next page</button>" in driver.page_source:
            driver.find_element_by_css_selector('.myButton').click()

        new_height = driver.execute_script("return document.body.scrollHeight")

        if new_height == last_height:
            break
        last_height = new_height

#Load this and comment out chrome headless code below, if needed.
#driver = webdriver.PhantomJS()

#Chrome driver
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)

driver.get("https://www.chessable.com/discussion/thread/58883/official-chessable-launch-schedule-2019/")

scrollDownAllTheWay(driver)

When I run the script above with webdriver.PhantomJS() (replacing the Chrome portion with it), I have no issues. The function runs until it the headless browser reaching the last page. Great.

When I run the script below with webdriver.Chrome() headless, I run into the following error:

ElementClickInterceptedException: Message: element click intercepted: Element <button id="load-next-comments" class="myButton">...</button> is not clickable at point (388, 23). Other element would receive the click: <div class="headerHolder">...</div>   (Session info: headless chrome=76.0.3809.132)

I couldn't find anything that helpful to solve this problem. Even more strange is that if you disable the options.add_argument("--headless") part (comment it out), the page loads just fine, and completes the entire page's scroll. I can see the final clicks execute in my local Chrome browser, then see it stop scrolling & clicking when it has completed.

Question: Why is the headless Chrome session not properly working here, but the non-headless version is?

Edit: I just found this post, which could be potentially helpful, but I am not sure.

Note: I'm open to using other browser drivers like FireFox() or anything else as a potential workaround, but still the question remains.

InfiniteFlash
  • 1,038
  • 1
  • 10
  • 22
  • 1
    use this url - `https://www.chessable.com/ajax/discussionComments.php?threadId=58883&page=1&perPage=500&order=best` simple!! you really dont need selenium and headless if you want only comments, i would suggest go for request lib – Dev Sep 03 '19 at 21:50
  • @Dev Wow, you saved me a lot of work. Thanks! Would you be able to share any insights about ajax? How did you find/construct that URL? I'm in my infancy of webscraping. – InfiniteFlash Sep 03 '19 at 22:07
  • 1
    F12 for developer tools, then click on metwork tab. You can also filter for XHRs – pguardiario Sep 04 '19 at 00:44
  • @pguardiario Thanks to both of you. I appreciate it. This works as a nice alternative to the OP. I would have never thought about this. – InfiniteFlash Sep 04 '19 at 00:52

4 Answers4

1

There's an element on top of that button that's making it not clickable. If you change:

driver.find_element_by_css_selector('.myButton').click()

to

driver.execute_script("document.querySelector('.myButton').click()")

It should work. In fact, doing everything from javascript is not a bad idea unless you're "QA testing"

pguardiario
  • 53,827
  • 19
  • 119
  • 159
1

JavaScript is not required.If you set the window-size on headless mode it will click on the next_page button.Hope this will help.

import selenium.webdriver as webdriver
from selenium.webdriver.chrome.options import Options

def scrollDownAllTheWay(driver):
    last_height = driver.execute_script("return document.body.scrollHeight")

    while True:
        driver.execute_script("window.scrollTo(0, 100*document.body.scrollHeight);")

        time.sleep(3)

        if "Load next page</button>" in driver.page_source:

            driver.find_element_by_css_selector('.myButton').click()
            print('clicked')

        new_height = driver.execute_script("return document.body.scrollHeight")

        if new_height == last_height:
            break
        last_height = new_height


options = Options()
options.add_argument("--headless")
options.add_argument('window-size=1920x1080')
driver = webdriver.Chrome(options=options)

driver.get("https://www.chessable.com/discussion/thread/58883/official-chessable-launch-schedule-2019/")

scrollDownAllTheWay(driver)

To verify that whether code working or not just take screenshot before or after and you will come to know it is working.

import selenium.webdriver as webdriver
from selenium.webdriver.chrome.options import Options

def scrollDownAllTheWay(driver):
    last_height = driver.execute_script("return document.body.scrollHeight")
    i = 1
    while True:
        driver.execute_script("window.scrollTo(0, 100*document.body.scrollHeight);")

        time.sleep(3)

        if "Load next page</button>" in driver.page_source:
            driver.save_screenshot("screenshot_{}.png".format(i))
            i = i+1
            driver.find_element_by_css_selector('.myButton').click()
            driver.save_screenshot("screenshot_{}.png".format(i))
            i = i + 1
            print('clicked')

        new_height = driver.execute_script("return document.body.scrollHeight")

        if new_height == last_height:
            break
        last_height = new_height


options = Options()
options.add_argument("--headless")
options.add_argument('window-size=1920x1080')
driver = webdriver.Chrome(options=options)

driver.get("https://www.chessable.com/discussion/thread/58883/official-chessable-launch-schedule-2019/")

scrollDownAllTheWay(driver)
KunduK
  • 32,888
  • 5
  • 17
  • 41
1

I had the same issue with Chromedriver.

Solved it by adding these options to my code:

options.add_argument("--window-size=1920,1080")
options.add_argument("--start-maximized")
options.add_argument("--headless")

PS: I found the solution here: https://github.com/SeleniumHQ/selenium/issues/4685

glaucon
  • 190
  • 1
  • 9
1

Recently I also stumbled upon this problem. After much of debugging and researching I figured out one logical explanation as to why the code work in non-headless mode and gives the same error as you in headless mode.

This is because in headless mode if you don't specify the size of the chrome window (by driver.set_window_rect(width=1200, height=900)) then there might be a pop up that would cover the clickable button and hence prevent it from getting clicked.

So, ideally giving the window an explicit size will make all the pop-up reside to their particular place and prevent hiding of the button that you want to click.

Specifying the window size worked for me and I think it should work for you as well.

Mr. Techie
  • 622
  • 7
  • 17