5

I am scraping this webpage for usernames which loads the users after scrolling

Url to page : "http://www.quora.com/Kevin-Rose/followers"

I know the number of users on the page (in this case no. is 43812) How can I scroll the page till all the users are loaded? I have searched for the same on the internet and everywhere I got almost same line of code for doing it which is:

driver.execute_script("window.scrollTo(0, )")

How can I determine the vertical position to ensure that all the users are loaded? Is there any other option to achieve the same thing without actually scrolling?

   from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import urllib

driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
time.sleep(10)

wait = WebDriverWait(driver, 10)

form = driver.find_element_by_class_name('regular_login')
time.sleep(10)
#add explicit wait

username = form.find_element_by_name('email')
time.sleep(10)
#add explicit wait

username.send_keys('abc@gmail.com')
time.sleep(30)
#add explicit wait

password = form.find_element_by_name('password')
time.sleep(30)
#add explicit wait

password.send_keys('def')
#add explicit wait

password.send_keys(Keys.RETURN)
time.sleep(30)

#search = driver.find_element_by_name('search_input')
search = wait.until(EC.presence_of_element_located((By.XPATH, "//form[@name='search_form']//input[@name='search_input']")))

search.clear()
search.send_keys('Kevin Rose')
search.send_keys(Keys.RETURN)

link = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "Kevin Rose")))
link.click()
#Wait till the element is loaded (Asynchronusly loaded webpage)

handle = driver.window_handles
driver.switch_to.window(handle[1])
#switch to new window 

element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Followers")))
element.click()
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Siddhesh
  • 472
  • 1
  • 9
  • 24
  • There are certainly options. Please show the complete code you have now (including scrolling part). Thanks. – alecxe Sep 16 '14 at 14:05
  • I dont think its of any use but I have added the code. This is just code to log into the site and navigate to particular page. I dont know what to add in y coordinate position? – Siddhesh Sep 16 '14 at 14:11

1 Answers1

4

Since there is nothing special appearing after the last followers bucket is loaded, I would rely on the fact that you know how many followers does the user have and you know how many are loaded on each scroll down (I've inspected - it is 18 per scroll). Hence, you can calculate how many times do you need to scroll the page down.

Here's the implementation (I've used a different user with only 53 followers to demonstrate the solution):

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

followers_per_page = 18

driver = webdriver.Chrome()  # webdriver.Firefox() in your case
driver.get("http://www.quora.com/Andrew-Delikat/followers")

# get the followers count
element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.XPATH, '//li[contains(@class, "FollowersNavItem")]//span[@class="profile_count"]')))
followers_count = int(element.text.replace(',', ''))
print followers_count

# scroll down the page iteratively with a delay
for _ in xrange(0, followers_count/followers_per_page + 1):
    driver.execute_script("window.scrollTo(0, 10000);")
    time.sleep(2)

Also, you may need to increase this 10000 Y coordinate value based on the loop variable in case there is a big number of followers.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195