I'm trying to grab the entire html webpage of this website's forum. The comment section can only be loaded once you scroll down. After some scrolling, you will find that eventually (on page 4) a Load Next Page button appears that you must click to get the proceeding comments. After much searching, the code below works quite well to get the to final page of comments. Much of it is taken from this stackoverflow post and this one as well.
For reference, I am on Windows 10 and my Chrome driver version is 76.0.3809.132
. I also used PhantomJS just to see which one would load quicker. Both driver .exe
files are placed in the same directory as the one I'm executing the script from. I have not encountered any issues up until today.
import selenium.webdriver as webdriver
from selenium.webdriver.chrome.options import Options
def scrollDownAllTheWay(driver):
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, 100*document.body.scrollHeight);")
time.sleep(3)
if "Load next page</button>" in driver.page_source:
driver.find_element_by_css_selector('.myButton').click()
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
#Load this and comment out chrome headless code below, if needed.
#driver = webdriver.PhantomJS()
#Chrome driver
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
driver.get("https://www.chessable.com/discussion/thread/58883/official-chessable-launch-schedule-2019/")
scrollDownAllTheWay(driver)
When I run the script above with webdriver.PhantomJS()
(replacing the Chrome portion with it), I have no issues. The function runs until it the headless browser reaching the last page. Great.
When I run the script below with webdriver.Chrome()
headless, I run into the following error:
ElementClickInterceptedException: Message: element click intercepted: Element <button id="load-next-comments" class="myButton">...</button> is not clickable at point (388, 23). Other element would receive the click: <div class="headerHolder">...</div> (Session info: headless chrome=76.0.3809.132)
I couldn't find anything that helpful to solve this problem. Even more strange is that if you disable the options.add_argument("--headless")
part (comment it out), the page loads just fine, and completes the entire page's scroll. I can see the final clicks execute in my local Chrome browser, then see it stop scrolling & clicking when it has completed.
Question: Why is the headless Chrome session not properly working here, but the non-headless version is?
Edit: I just found this post, which could be potentially helpful, but I am not sure.
Note: I'm open to using other browser drivers like FireFox()
or anything else as a potential workaround, but still the question remains.