0

I want to extract only a Celsius temperature variable from a web server which updates every few seconds. My code so far is:

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = "http://192.168.251.184"
page = urlopen(url)
html = page.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")
print(soup.get_text())

But this prints raw html format data as well:

I have tried unsuccessfully to use the find() function to just print the variable, i.e. 19.44.

An extract of the page source of this variable is (target variable on 3rd last line is 19.44):

<!DOCTYPE HTML><html>
<head>
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.2/css/all.css" integrity="sha384-fnmOCqbTlWIlj8LyTjo7mOUStjsKC4pOpQbqyi7RrhN7udi9RwhKkMHpvLbHG9Sr" crossorigin="anonymous">
  <style>
    html {
     font-family: Arial;
     display: inline-block;
     margin: 0px auto;
     text-align: center;
    }
    h2 { font-size: 3.0rem; }
    p { font-size: 3.0rem; }
    .units { font-size: 1.2rem; }
    .ds-labels{
      font-size: 1.5rem;
      vertical-align:middle;
      padding-bottom: 15px;
    }
  </style>
</head>
<body>
  <h2>ESP DS18B20 Server</h2>
  <p>
    <i class="fas fa-thermometer-half" style="color:#059e8a;"></i> 
    <span class="ds-labels">Temperature Celsius</span> 
    <span id="temperaturec">19.44</span>
    <sup class="units">&deg;C</sup>
  </p>

Could you please help me scrape just the Celsius variable?

Richard
  • 1
  • 1
  • You can just do `soup.find("span",{"id":"temperaturec"}).text` and you can look here for details https://stackoverflow.com/a/2136323/12446721 – imxitiz Jul 08 '21 at 05:52

3 Answers3

1
float(soup.find("span", {"id": "temperaturec"}).text)
OKEE
  • 450
  • 3
  • 15
0

Yes. Here's how you can:

celsius = soup.find("span", {"id": "temperaturec"}).text
-1

Since the temperature is in the second span tag i.e <span id="temperaturec">19.44</span> we will access this tag using soup.find_all and print the content from it using .string

from bs4 import BeautifulSoup as bs

html_doc = """
<!DOCTYPE HTML><html>
<head>
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.2/css/all.css" integrity="sha384-fnmOCqbTlWIlj8LyTjo7mOUStjsKC4pOpQbqyi7RrhN7udi9RwhKkMHpvLbHG9Sr" crossorigin="anonymous">
  <style>
    html {
     font-family: Arial;
     display: inline-block;
     margin: 0px auto;
     text-align: center;
    }
    h2 { font-size: 3.0rem; }
    p { font-size: 3.0rem; }
    .units { font-size: 1.2rem; }
    .ds-labels{
      font-size: 1.5rem;
      vertical-align:middle;
      padding-bottom: 15px;
    }
  </style>
</head>
<body>
  <h2>ESP DS18B20 Server</h2>
  <p>
    <i class="fas fa-thermometer-half" style="color:#059e8a;"></i> 
    <span class="ds-labels">Temperature Celsius</span> 
    <span id="temperaturec">19.44</span>
    <sup class="units">&deg;C</sup>
  </p>
</div>"""


soup = bs(html_doc,'html.parser')
list_of_spans = soup.find_all('span')
print(list_of_spans[1].string)
Tanish Sarmah
  • 430
  • 5
  • 14
  • Would glad to know on what this post is lacking. – Tanish Sarmah Jul 08 '21 at 09:56
  • There is no need to `find_all` the spans because the one span of interest has an outstanding attribute (`id`) which lets it be found quicker and more accurately. Besides, your approach will not work if more spans were added to this page before the one containing the temperature. For these reasons, @SlLoWre's answer is better and more accurate. – Captain Jack Sparrow May 12 '22 at 12:38