I use a local pywb instance, which you might know it as https://webrecorder.io/, for archiving/recording webpages, which requires a browser for archiving. But some pages load content if one scroll down.
There is a button at the bottom, called "Click to load more" which is clickable when visiting the site directly (the page scrolls down)
$ cat scraper.py
from selenium import webdriver
from time import sleep
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://www.minds.com/medworthy")
sleep(20)
while driver.find_element("xpath","//infinite-scroll/div[contains(text(),'Click to load more')]"):
print("click")
driver.find_element("xpath","//infinite-scroll/div[contains(text(),'Click to load more')]").click()
print("sleep 3 sec")
sleep(3)
(...)
But if you change the URL in the script, pointing to a local address http://localhost:8080/minds-selenium/record/https://www.minds.com/medworthy I get this error message while running and it doesn't scroll down
python3 init-scraper.py
click
Traceback (most recent call last):
File "init-scraper.py", line 13, in <module>
driver.find_element("xpath","//infinite-scroll/div[contains(text(),'Click to load more')]").click()
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webelement.py", line 628, in _execute
return self._parent.execute(command, params)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale
element reference: element is not attached to the page document
(Session info: chrome=68.0.3440.75)
(Driver info: chromedriver=2.41.578700
(2f1ed5f9343c13f73144538f15c00b370eda6706),platform=Linux 4.9.0-7-amd64 x86_64)
I have disabled frames to be able to scroll to the bottom of the page, similar to this. But I still want to be able to scroll with a button click, which would error out if it doesn't display that text anymore, so I can proceed with something else with the script.
Is there a way to interact with the page, even within an archiving service?