0

There are lots of helpful questions and answers regarding BeautifulSoup table parsing, however the issue I am running into is that I cannot find the data I'm looking for when I prettify the soup.

I'm looking to turn commodity prices into a pandas dataframe. My code (so far) is below:

from urllib.request import urlopen
from bs4 import BeautifulSoup

page = urlopen('http://futures.tradingcharts.com/marketquotes/HE.html')
soup = BeautifulSoup(page, 'html.parser')
print (soup.prettify())

I've looked at the BeautifulSoup documentation, and I may be missing something, however I've combed through what my console prints out, and I cannot even find the 'Open', 'High', etc. prices in the prettified version of the html. Therefore, I can't figure out how to search for the appropriate tags within the html.

Please advise - thank you for your help.

David William
  • 65
  • 2
  • 9
  • 4
    Are you sure the prices are present when the page is first loaded? Perhaps the page has some javascript that dynamically adds price information at some point after the page initially loads. – John Gordon Jan 29 '18 at 18:23
  • I am sure the prices are present when the page is first loaded. There are references to javascript in the prettified html, though I am unsure as to whether that's the means by which it's adding the prices. – David William Jan 29 '18 at 18:27
  • 1
    @DavidWilliam , no they are not loaded with the page. They are loaded through some other script, I just checked, use, `selenium` to get the data. – Stack Jan 29 '18 at 18:31
  • I find an easy way to debug the issues with dynamically generated contents is to save the received `page` and open it in your browser to confirm the contents. If it's not present, as @Stack mentioned, use `selenium`. – r.ook Jan 29 '18 at 18:33
  • When I load the URL in my browser, the prices come up, that is what I was referring to when I say "the page is first loaded." I will attempt using selenium Thank you. – David William Jan 29 '18 at 18:35
  • It is javascript rendered page Use selenium and then use pd.read_html() check this out: https://stackoverflow.com/questions/25062365/python-parsing-html-table-generated-by-javascript https://stackoverflow.com/questions/42128760/converting-html-table-to-a-pandas-dataframe – Pygirl Jan 29 '18 at 21:11

0 Answers0