new to selenium and I have the below question still after searching for solutions.
I am trying to access all the links on this website (https://www.ecb.europa.eu/press/pressconf/html/index.en.html).
The individual links gets loaded in a "lazy-load" fashion. And it gets loaded gradually as user scrolls down the screen.
driver = webdriver.Chrome("chromedriver.exe")
driver.get("https://www.ecb.europa.eu/press/pressconf/html/index.en.html")
# scrolling
lastHeight = driver.execute_script("return document.body.scrollHeight")
#print(lastHeight)
pause = 0.5
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(pause)
newHeight = driver.execute_script("return document.body.scrollHeight")
if newHeight == lastHeight:
break
lastHeight = newHeight
print(lastHeight)
# ---
elems = driver.find_elements_by_xpath("//a[@href]")
for elem in elems:
url=elem.get_attribute("href")
if re.search('is\d+.en.html', url):
print(url)
However it only gets the required link of the last lazy-loading element, and everything before it is not obtained because they are not loaded.
I want to make sure all lazy-loading element to have loaded before executing any scraping codes. How can I do that?
Many thanks