I scrape wsj site for posts about commodities and futures. The structure of HTML page leads me to finding this piece of html to get all the info I need from one post
This is this piece HTML:
<article class="WSJTheme--story--XB4V2mLz WSJTheme--design-refresh--2eDQsiEp WSJTheme--design-refresh-4u--WkTDMafN " data-id="SB10596806121828464523104587634361262943098">
<div class="WSJTheme--articleType--34Gt-vdG "> <span class="">Commodities</span></div>
<div class="WSJTheme--headline--7VCzo7Ay ">
<h2 class="WSJTheme--headline--unZqjb45 undefined ">
<a class="" href="https://www.wsj.com/articles/a-gold-mine-takeover-highlights-
increasing-mining-sector-risk-11628433248"><span class="WSJTheme--headlineText--
He1ANr9C ">A Gold Mine Takeover Highlights Increasing Mining-Sector Risk </span></a>
</h2>
</div>
<p class="WSJTheme--summary--lmOXEsbN typography--serif--1CqEfjrc ">
<span class="WSJTheme- -summaryText--2LRaCWgJ ">
Kyrgyzstan’s nationalization of Centerra Gold’s large mining operation is one of the
most brazen moves in recent years by a country to assert control over valuable natural
resources, mining and legal experts say.
</span>
<span class="WSJTheme--stats--2HBLhVc9 "></span></p>
<div class="">
<p class="WSJTheme--byline--1oIUvtQ3 ">Jacquie McNish and Joe Wallace</p>
<div class="WSJTheme--timestamp--2zjbypGD ">
<p aria-label="Updated August 8, 2021" class="WSJTheme--timestamp--22sfkNDv ">August 8, 2021</p>
</div>
My code for scraping looks like:
def scrape(self, src):
source = requests.get(src).text
soup = BeautifulSoup(source, 'lxml')
for article in soup.find_all('article'):
headline = article.h2.a.span.text
if not headline:
continue
print(headline)
After scraping all this posts I get this:
What Parents With Unvaccinated Kids Need to Know About the Delta Variant This Summer
JPMorgan, Goldman Call Time on Work-From-Home. Their Rivals Are Ready to Pounce.
Some Vaccinated People Are Dying of Covid-19. Here’s Why Scientists Aren’t Surprised. Video Shows Demolition of Miami-Area Condo Building
How the EV Industry Is Trying to Fix Its Charging Bottleneck
Watch Chinese Astronauts’ First Spacewalk Outside New Space Station
While I should get:
A Gold Mine Takeover Highlights Increasing Mining-Sector Risk etc...
That's a link to site a scrape from: https://www.wsj.com/news/markets/oil-gold-commodities-futures