2

Hello everyone i'm trying to use selenium and scrapy to scraping some information from https://answers.yahoo.com/dir/index/discover?sid=396545663

I try different method, i use Selenium and setting PhantomJs like driver. For scrolling down the page, it's a infinite scroll page, i use this instruction:

elem.send_keys(Keys.PAGE_DOWN)

For simulating the press of Page Down button, instead of the JavaScript function:

browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

Because this one "seems" load less elements in the page.

The main problem is how i can know when i have reached the bottom of the page? Is "Infinite Scroll" page so i can't know when it end i need to scroll down, but i don't have any element in the bottom to analyze.

Actually i use temporized cycle, but look really stupid.

Thanks

RedVelvet
  • 1,683
  • 3
  • 15
  • 24

2 Answers2

3

I would actually look for that "Loading..." indicator. Wait for it to be visible on every scroll, but if you'll get a TimeoutException - there was no loading indicator this time and there are no more items to load.

Sample implementation:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)

while True:
    # do the scrolling
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    try:
        wait.until(EC.visibility_of_element_located((By.XPATH, "//*[. = 'Loading...']")))
    except TimeoutException:
        break  # not more posts were loaded - exit the loop

Not tested.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thanks for your respose, however Yahoo dosn't have this kind of icon or any indicator of loading. – RedVelvet Oct 05 '15 at 08:06
  • 1
    @RedVelvet it has, at the bottom when you scroll, look for the appearing "Loading ..." element, it has `id="ya-infinite-scroll-message"` and "Loading ..." text. – alecxe Oct 05 '15 at 09:17
  • thanks @alecxe i use wait.until(EC.visibility_of_element_located((By.ID, "ya-infinite-scroll-message"))) and it works, but he stop after 80 questions... it's strange. – RedVelvet Oct 05 '15 at 15:20
  • EDIT: It's a great solutions, the fault is of the website seems that load different number of elements any time so change every times. – RedVelvet Oct 10 '15 at 13:08
  • 1
    @RedVelvet yeah, I am afraid that waiting for the "loading" indicator might not be reliable enough. I've seen your follow-up question and will take a look if I'll find time. Thanks. – alecxe Oct 11 '15 at 03:14
  • thanks for your help me, i'm a novice with this technique. If you find a solution please answer me in the follow-up question. :) – RedVelvet Oct 11 '15 at 10:39
0

As example you can create some parallel thread witch will check page for ajax requests. If time between requests is more often than 10 seconds -- you on the end of page. Have no other idea.

Andrew_STOP_RU_WAR_IN_UA
  • 9,318
  • 5
  • 65
  • 101