0

I am trying to download the fully generated html source file for the following url: http://www.morningstar.com/funds/xnas/vinix/quote.html

In particular I am interested in extracting the generated numerical data in the table under the header "Performance VINIX", for instance, the row "Growth in 10,000". I have tried the approach outlined in this popular answer. But the saved text html file looks just like the pre-generated raw source file, with all the javascript and none of the generated content. For instance, when I grep for the word "Growth" I get nothing.

I have also gone through the DOM structure in chrome web devtools to identify the innermost element that contains this table, whose xpath is /html/body, and use the find_element_by_xpath technique to isolate the element, then saved the following string object:

content = browser.find_element_by_xpath('/html/body').text

Still that did not work. Any idea why? Many thanks!

Community
  • 1
  • 1
John Jiang
  • 827
  • 1
  • 9
  • 19

1 Answers1

1

If you want to get already generated table you need to wait a little until its presence in DOM. Also note that it is located inside iframe so you need to switch to that frame first before searching for required elements

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait

wait(browser, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//iframe[starts-with(@id, "QT_IFRAME_")]')))
table = wait(browser, 20).until(EC.presence_of_element_located((By.ID, "idPerformanceContent")))

Then you can scrape required data:

for i in table.find_elements_by_xpath('.//tr[td="Growth of 10,000"]/td')[1:]:
    print(i.text)
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • Thanks a lot Andersson. As a front end novice there is little way I could have figured this out myself in 2 hours. – John Jiang Apr 23 '17 at 14:39