get data from MarketWatch

Question

Using BeautifulSoup I am trying to scrape MarketWatch

from bs4 import BeautifulSoup
import requests
import pandas


url = "https://www.marketwatch.com/investing/stock/khc/profile"
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text

soup = BeautifulSoup(html_content, "lxml")

Now I would like to extract the "P/E Current" and the "Price to Sales Ratio" that in the html are in the class

[...]
<div class="sixwide addgutter">

    <div class="block threewide addgutter">

        <h2>Valuation</h2>    
                <div class="section">    
            <p class="column">P/E Current</p>    
            <p class="data lastcolumn">19.27</p>    
        </div>    
                <div class="section">    
            <p class="column">P/E Ratio (with extraordinary items)</p>    
            <p class="data lastcolumn">19.55</p>    
        </div>    
                <div class="section">
            <p class="column">P/E Ratio (without extraordinary items)</p>
            <p class="data lastcolumn">20.00</p>
        </div>
                <div class="section">
            <p class="column">Price to Sales Ratio</p>
            <p class="data lastcolumn">1.55</p>
        </div>
                <div class="section">
            <p class="column">Price to Book Ratio</p>
            <p class="data lastcolumn">0.75</p>
        </div>
        [...]

How can I get them?

I use the command

section = soup.findAll('div', {'class' : 'section'})

but then I don't know how to go ahead to get the values I am interested in, can you help?

Does this answer your question? [python BeautifulSoup parsing table](https://stackoverflow.com/questions/23377533/python-beautifulsoup-parsing-table) — Ken Kinder, Jun 01 '20 at 14:25

Nico Müller · Answer 1 · 2020-06-01T14:33:24.273

You can do the following

find all sections
loop through the sections and find its p elements
if the p contains your search text, get its next sibling with the value.

url = "https://www.marketwatch.com/investing/stock/khc/profile"
html_content = requests.get(url).text

soup = BeautifulSoup(html_content, "lxml")
sections = soup.findAll('div', attrs={'class' : 'section'})
for section in sections : 
    ps = section.find('p')
    if "P/E Current" in ps.getText() or "Price to Sales Ratio" in ps.getText(): 
    val = ps.nextSibling.nextSibling
    print(f"{ps.getText()}: {val.getText()}")

OUT: P/E Current: 19.27
     Price to Sales Ratio: 1.55

Hi @NicoMüller, the above code is not producing any output. Would you be able to recheck please? — Aquaholic, Jul 30 '21 at 10:42

0buz · Accepted Answer · 2021-08-02T21:38:04.057

1

This solution will get data from all sections under div with class "sixwide addgutter". The result is in the form of list of dictionaries:

soup = BeautifulSoup(req.content, 'lxml')

base = soup.find('div', attrs={'class' : 'sixwide addgutter'})
section = base.find_all('div', attrs={'class' : 'section'})

all_data=[]
for item in section:
    data = {}
    data['name']=item.p.text
    data['value']=item.p.findNext('p').text
    all_data.append(data)

Output sample:

If you want the specifics, "P/E Current" and "Price to Sales Ratio" are all_data[0], all_data[3] respectively.

edited Aug 02 '21 at 21:38

answered Jun 01 '20 at 14:31

0buz

3,443
2
8
29

,Its producing the following error ---> 12 section = base.findAll('div', attrs={'class' : 'section'}) 13 14 all_data=[] AttributeError: 'NoneType' object has no attribute 'findAll' @0buz – Aquaholic Jul 30 '21 at 10:46
`findAll` is now obsolete. Replace it with `find_all`. – 0buz Aug 02 '21 at 21:42

get data from MarketWatch

2 Answers2