2

I have been trying to scrape some data using beautiful soup from https://www.eia.gov/coal/markets/. However when I parse the contents some of the data does not show up at all. Those data fields are visible in chrome inspector but not in the soup. The thing is they do not seem to be text elements. I think they are fed using an external database. I have attached the screenshots below. Is there any other way to scrape that data?

Thanks in advance.

Google inspector:

enter image description here

Beautiful soup parsed content:

enter image description here

Rexon
  • 141
  • 1
  • 10

4 Answers4

1

Not enough detail in your question but this information is probably dynamically loaded and you're not fetching the entire page source. Without your code it's tough to see if you're using selenium to do it (you tagged this questions as such) which may indicate you're using page_source which does not guarantee you the entire completed source of the page you're looking at. If you're using requests its even more unlikely you're capturing the entire page's completed source code.

DMart
  • 2,401
  • 1
  • 14
  • 19
1

The data is loaded via ajax, so it is not available in the initial document. If you go to the networking tab in chrome dev tools you will see that the site reaches out to https://www.eia.gov/coal/markets/coal_markets_json.php. I searched for some of the numbers in the response and it looks like the data you are looking for is there.

This is a direct json response from the backend. Its better than selenium if you can get it to work.

sarartur
  • 1,178
  • 1
  • 4
  • 13
1

@DMart is correct. The data you are looking for is being populated by Javascript, have a look at line 1629 in the page source. Beautiful soup doesn't act as a client browser so there is nowhere for the JS to execute. So it looks like selenium is your best bet.

See This thread for more information.

0

Thanks you all!

Opening the page using selenium using a webdriver and then parsing the page source using beautiful soup worked.

webdriver.get('https://www.eia.gov/coal/markets/')
    
html=webdriver.page_source
soup=BS(html)

table=soup.find("table",{'id':'snl_dpst'})
rows=table.find_all("tr")

enter image description here

Rexon
  • 141
  • 1
  • 10