Beautiful Soup not picking up some data form the website

Question

I have been trying to scrape some data using beautiful soup from https://www.eia.gov/coal/markets/. However when I parse the contents some of the data does not show up at all. Those data fields are visible in chrome inspector but not in the soup. The thing is they do not seem to be text elements. I think they are fed using an external database. I have attached the screenshots below. Is there any other way to scrape that data?

Thanks in advance.

Google inspector:

Beautiful soup parsed content:

The data is dynamically loaded from https://www.eia.gov/coal/markets/coal_markets_json.php after the page gets loaded. — Epsi95, Feb 12 '21 at 16:25

score 1 · Answer 1 · answered Feb 12 '21 at 16:18

Not enough detail in your question but this information is probably dynamically loaded and you're not fetching the entire page source. Without your code it's tough to see if you're using selenium to do it (you tagged this questions as such) which may indicate you're using page_source which does not guarantee you the entire completed source of the page you're looking at. If you're using requests its even more unlikely you're capturing the entire page's completed source code.

sarartur · Answer 2 · 2021-02-13T05:33:48.347

1

The data is loaded via ajax, so it is not available in the initial document. If you go to the networking tab in chrome dev tools you will see that the site reaches out to https://www.eia.gov/coal/markets/coal_markets_json.php. I searched for some of the numbers in the response and it looks like the data you are looking for is there.

This is a direct json response from the backend. Its better than selenium if you can get it to work.

edited Feb 13 '21 at 05:33

answered Feb 12 '21 at 16:25

sarartur

1,178
1
4
13

How can you see that? I didn't find this url in Network tab of dev tools. Is it somewhere specifically? – IoaTzimas Feb 12 '21 at 16:39
Refresh the page while you have the network tab open. It should come up towards the bottom. – sarartur Feb 12 '21 at 16:41

score 1 · Accepted Answer · answered Feb 12 '21 at 16:39

1

@DMart is correct. The data you are looking for is being populated by Javascript, have a look at line 1629 in the page source. Beautiful soup doesn't act as a client browser so there is nowhere for the JS to execute. So it looks like selenium is your best bet.

See This thread for more information.

answered Feb 12 '21 at 16:39

cannonfodda

76
3

LOL, he said I was correct but you selected this answer as right! – DMart Feb 12 '21 at 19:57

score 0 · Answer 4 · answered Feb 12 '21 at 19:01

Thanks you all!

Opening the page using selenium using a webdriver and then parsing the page source using beautiful soup worked.

webdriver.get('https://www.eia.gov/coal/markets/')
    
html=webdriver.page_source
soup=BS(html)

table=soup.find("table",{'id':'snl_dpst'})
rows=table.find_all("tr")

Beautiful Soup not picking up some data form the website

4 Answers4