1

This is the only code I could find to scroll down to the end of the page, nothing else has worked. The problem is that the While True statement never is completed and it continues to try and scroll downward even after it hits bottom and therefore never goes to the next step of printing. How can I end the While True statement and print the results? Thankyou

 from selenium import webdriver

    url = 'http://www.tradingview.com/screener'
    driver = webdriver.Firefox()
    driver.get(url)

    # Get scroll height
    last_height = driver.execute_script("return document.body.scrollHeight")

    while True:
        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # will give a list of all tickers
    tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 

    for index in range(len(tickers)):
       print("Row " + tickers[index].text + " ") 

Errors I'm receiving


>>> from selenium import webdriver
>>> url = 'http://www.tradingview.com/screener'
>>> driver = webdriver.Firefox()
>>> driver.get(url)
>>>
>>> # Get scroll height
... last_height = driver.execute_script("return document.body.scrollHeight")
>>>
>>> selector = '.js-field-total.tv-screener-table__field-value--total'
>>> matches = driver.find_element_by_css_selector(selector)
>>> matches = int(matches.text.split()[0])
>>>
>>> visible_rows = 0
>>> scrolls = 0
>>>
>>> while visible_rows < matches:
...
  File "<stdin>", line 2

    ^
IndentationError: expected an indented block
>>>     driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
  File "<stdin>", line 1
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    ^
IndentationError: unexpected indent
>>>
>>>     # Wait 10 scrolls before updating row information
...     if scrolls == 10:
  File "<stdin>", line 2
    if scrolls == 10:
    ^
IndentationError: unexpected indent
>>>         table = driver.find_elements_by_class_name('tv-data-table__tbody')
  File "<stdin>", line 1
    table = driver.find_elements_by_class_name('tv-data-table__tbody')
    ^
IndentationError: unexpected indent
>>>         visible_rows = len(table[1].find_elements_by_tag_name('tr'))
  File "<stdin>", line 1
    visible_rows = len(table[1].find_elements_by_tag_name('tr'))
    ^
IndentationError: unexpected indent
>>>         scrolls = 0
  File "<stdin>", line 1
    scrolls = 0
    ^
IndentationError: unexpected indent
>>>
>>>     scrolls += 1
  File "<stdin>", line 1
    scrolls += 1
    ^
IndentationError: unexpected indent
>>>
>>> # will give a list of all tickers
... tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
>>>
>>> for index in range(len(tickers)):
...    print("Row " + tickers[index].text + " ")
...
  • What you could do, is to identify an element at the bottom of the page and call scrollToVisible for that element. Also what language are you using? In C# we can do the following `public static void ScrollToVisible(this IWebElement element) { var js = (IJavaScriptExecutor) Browser.Instance.WebDriver; js.ExecuteScript("arguments[0].scrollIntoView(true);", element); }` – Vaptsarov Jun 20 '18 at 11:28
  • Hi, I'm using Python. – Glenn Anderson Jun 20 '18 at 11:36
  • You can call `driver.execute_script("arguments[0].scrollIntoView();", element)` then. – Vaptsarov Jun 20 '18 at 11:39
  • That doesn't scroll, I've tried that. – Glenn Anderson Jun 20 '18 at 11:44
  • Have you looked at the following question: https://stackoverflow.com/questions/41744368/scrolling-to-element-using-webdriver – Vaptsarov Jun 20 '18 at 12:15
  • What you could do is look for an element at the bottom of the page and use isDispayed method, if it returns true then break out of that while loop.In java we use break; im not sure about python – mbn217 Jun 20 '18 at 14:35

1 Answers1

0

Under the ticker, it tells you how many rows (matches) are in the table. So, one option is to compare the number of visible rows to the total number of rows. When you reach that number (of visible rows), you quit the loop.

url = 'http://www.tradingview.com/screener'
driver = webdriver.Firefox()
driver.get(url)

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

selector = '.js-field-total.tv-screener-table__field-value--total'
matches = driver.find_element_by_css_selector(selector)
matches = int(matches.text.split()[0])

visible_rows = 0
scrolls = 0

while visible_rows < matches:

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait 10 scrolls before updating row information 
    if scrolls == 10:
        table = driver.find_elements_by_class_name('tv-data-table__tbody')
        visible_rows = len(table[1].find_elements_by_tag_name('tr'))
        scrolls = 0

    scrolls += 1

# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 

for index in range(len(tickers)):
   print("Row " + tickers[index].text + " ") 

Edit: Since your setup doesn't seem to allow the previous solution, here's a different approach you can try. The page loads 150 rows at a time. So, instead of counting the number of visible rows, we can use the total matches/rows we're expecting (e.g. 4894) and divide that by 150 to get the number of times we need to scroll. If we scroll at least that many times, in theory, all of the rows should be visible and we can continue with the code.

from time import sleep
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

url = 'http://www.tradingview.com/screener'
driver = webdriver.Chrome('./chromedriver')
driver.get(url)

try:

    selector = '.js-field-total.tv-screener-table__field-value--total'
    condition = EC.visibility_of_element_located((By.CSS_SELECTOR, selector))
    matches = WebDriverWait(driver, 10).until(condition)
    matches = int(matches.text.split()[0])

except (TimeoutException, Exception):
    print ('Problem finding matches, setting default...')
    matches = 4895 # Set default

# The page loads 150 rows at a time; divide matches by
# 150 to determine the number of times we need to scroll;
# add 5 extra scrolls just to be sure
num_loops = int(matches / 150 + 5)

for _ in range(num_loops):

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    sleep(2) # Pause briefly to allow loading time

# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 

n_tickers = len(tickers)

msg = 'Correct ' if n_tickers == matches else 'Incorrect '
msg += 'number of tickers ({}) found'
print(msg.format(n_tickers))

for index in range(n_tickers):
    print("Row " + tickers[index].text + " ")
T. Ray
  • 641
  • 4
  • 9
  • wouldn't [`visibility_of_all_elements_located`](https://selenium-python.readthedocs.io/api.html) be enough?? – oldboy Jun 27 '18 at 03:04
  • It would not be enough, no. In order for that to work, you need to know what elements you expect to become visible. We know that more table rows will load as you scroll, but the table rows have identical attributes. In terms of visibility, selenium won't be able to distinguish between loaded and to-be-loaded table rows. And even though the data is different per row, we can never be sure which order the rows will appear. Thus, the table row elements/data do not provide a sufficient basis for knowing when to stop scrolling. – T. Ray Jun 27 '18 at 11:52