Reading & Interacting With HTML Table Using Python

Question

I'm trying to web scrape information from an HTML table that has interactive ability to sift through various time periods. An example table is located at this URL: http://quotes.freerealtime.com/dl/frt/M?IM=quotes&type=Time%26Sales&SA=quotes&symbol=IBM&qm_page=45750.

I'd like to start at the time of 9:30 and then interact with the table by jumping forward 1 min. I want to export all of the data to a DataFrame. I've tried using pandas.read_html() and also tried using BeautifulSoup. Neither of these are working for me albeit I am inexperienced with BeautifulSoup. Is my request possible or has the website protected this information from web scraping? Any help would be appreciated!

Are you interested in selenium-specific approach? – alecxe Jan 11 '17 at 14:13 — alecxe, Jan 11 '17 at 14:13

score 1 · Accepted Answer · answered Jan 11 '17 at 21:21

The page is quite dynamic (and terribly slow, at least on my side), involves JavaScript and multiple asynchronous requests to get the data. Approaching that with requests would not be easy and you might need to fall into using browser automation via, for example, selenium.

Here is something for you to get started. Note the use of Explicit Waits here and there:

import pandas as pd
import time

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.maximize_window()
driver.get("http://quotes.freerealtime.com/dl/frt/M?IM=quotes&type=Time%26Sales&SA=quotes&symbol=IBM&qm_page=45750")

wait = WebDriverWait(driver, 400)  # 400 seconds timeout

# wait for select element to be visible
time_select = Select(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "select[name=time]"))))

# select 9:30 and go
time_select.select_by_visible_text("09:30")
driver.execute_script("arguments[0].click();", driver.find_element_by_id("go"))
time.sleep(2)

while True:
    # wait for the table to appear and load to pandas dataframe
    table = wait.until(EC.presence_of_element_located((By.ID, "qmmt-time-and-sales-data-table")))
    df = pd.read_html(table.get_attribute("outerHTML"))
    print(df[0])

    # wait for offset select to be visible and forward it 1 min
    offset_select = Select(wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "select[name=timeOffset]"))))
    offset_select.select_by_value("1")

    time.sleep(2)

    # TODO: think of a break condition

Note that this works really, really slow on my machine and I am not sure how well it would run on yours, but it continuously advances 1 minute forward in an endless loop (you would probably need to stop it at some point).

Thank you! I am having an error while running this. Message: 'geckodriver' executable needs to be in PATH — Evy555, Jan 11 '17 at 22:21
@Evy555 yeah, that's a [common problem with current selenium/firefox](http://stackoverflow.com/questions/40208051/selenium-using-python-geckodriver-executable-needs-to-be-in-path). — alecxe, Jan 11 '17 at 22:52

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

0

This page is rendered by JavaScript, if you disable the JS in your browser, the output of this page is:

requests or pandas only handle the HTML code.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 11 '17 at 02:18

宏杰李

11,820
2
28
35

So am I not able to access the information because it's rendered in JavaScript? – Evy555 Jan 11 '17 at 03:16
@Evy555 yes, use selenium if you want to interact with browser. – 宏杰李 Jan 11 '17 at 03:17

Reading & Interacting With HTML Table Using Python

2 Answers2