4

I am working on python selenium with chrome webdriver in window 8. I faced a page that loads data while scrolling using ajax.I tried injecting jquery and the following links do not work for me. Link 1 Link 2 Link 3

Can any one give me a right path to follow.

EDIT-------------

This is my partial code after alecxe's answer

    nam = driver.find_element(By.CLASS_NAME ,'_wu')

    #get length of review
    revcnt = driver.find_element(By.XPATH ,"//span[@class='_Mnc _yz']")
    revcnt = int(revcnt.text.replace(" reviews","").strip())
    print revcnt
    # wait for reviews to appear
    wait = WebDriverWait(driver, 10)
    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.review-snippet")))
    #reviews=[]
    while True:
        reviews = driver.find_elements_by_css_selector("div._ju")
        if len(reviews)<revcnt:
            driver.execute_script("arguments[0].scrollIntoView();", reviews[-1])
        else:
            driver.quit()
        print len(reviews)

But problem in escaping from while loop!

I tried it.

Community
  • 1
  • 1

1 Answers1

4

Make a loop, on every iteration scroll into view of the last "review" in the list (works for me):

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.google.com/search?q=a1%20plumbing%20boise&gws_rd=ssl#gws_rd=ssl&lrd=0x54aeff4cb0b24461:0x23720b81e2bed658,1")

# wait for reviews to appear
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.review-snippet")))

while True:
    reviews = driver.find_elements_by_css_selector("div._ju")
    driver.execute_script("arguments[0].scrollIntoView();", reviews[-1])

Note that the loop is endless here - you need to figure out how to exit the loop. For instance, you can count the reviews before and after scrolling into view and exit the loop if no more reviews were loaded. Or, you can check if the spinning circle is present or not. Once it's not shown on scroll - this means no more reviews to load left.

Here is one of the ideas to detect that no more reviews are there to be loaded - check if after the next scrolling the review dialog's scroll height has not changed - this is close to how human would detect it:

dialog = driver.find_element_by_css_selector("div.review-dialog-list")
last_scroll_height = 0

while True:
    reviews = driver.find_elements_by_css_selector("div._ju")
    driver.execute_script("arguments[0].scrollIntoView();", reviews[-1])

    # adding artificial delay (don't tell anyone I'm using sleep here)
    time.sleep(1)

    # if scroll height has not changed - exit
    scroll_height = driver.execute_script("return arguments[0].scrollHeight;", dialog)
    if scroll_height == last_scroll_height:
        break
    else:
        last_scroll_height = scroll_height

print(len(reviews)) 

I don't like having time.sleep() here, hope you'll have a better idea to tackle the problem.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thanks, could you tell how to exit loop since review length varies with other link and thus i do not know the length. –  Sep 16 '15 at 21:49
  • Can not escape from while loop! I am using counter to exit from loop but with vain.... –  Sep 17 '15 at 18:03
  • @SAZ yeah, it's kind of tricky, I've added this problem to my TODO list. Will get back to you later. Thanks! – alecxe Sep 17 '15 at 20:53
  • @alecxe- count does not work since it is not true for all url e.g. above link has 124 reviewer (in reality) but 125 is written on the web page. Againg this page https://www.google.com/search?q=Albert+Nahman+Plumbing#lrd=0x80857e798a45fe1d:0x3cc6b3be2874dea2,1 has correct count. I think better to try spinning tree presence but how to track a short lived element?? –  Sep 17 '15 at 22:56
  • @SAZ updated with an option to stop scrolling if no new reviews were loaded. – alecxe Sep 18 '15 at 03:50
  • A better method to check if page load to bottom is to check if the last elements text or id or href is the same in different scroll .if it is , meaning to the bottom, terminate the scroll! – Super-ilad Mar 23 '20 at 11:43