0

I'm trying to get and print the current weather temperature and city name from a local website, but no success. All I need it to read and print the city (Lodrina), the Temperature (23.1C) and if possible the title in ca-cond-firs ("Temperatura em declínio") - this last one changes as temps goes up or down...

This is the html section of the site:

THIS IS THE HTML (the part of matters:)
#<div class="ca-cidade"><a href="/site/internas/conteudo/meteorologia/grafico.shtml?id=23185109">Londrina</a></div>
<ul class="ca-condicoes">
<li class="ca-cond-firs"><img src="/site/imagens/icones_condicoes/temperatura/temp_baixa.png" title="Temperatura em declínio"/><br/>23.1°C</li>
<li class="ca-cond"><img src="/site/imagens/icones_condicoes/vento/L.png"/><br/>10 km/h</li>
<li class="ca-cond"><div class="ur">UR</div><br/>54%</li>
<li class="ca-cond"><img src="/site/imagens/icones_condicoes/chuva.png"/><br/>0.0 mm</li>

THIS IS THE CODE I DID SO FAR:

from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'lxml')

id = soup.find('a', 'id=23185109')
print(id)

any help?

grissom
  • 11
  • 2

4 Answers4

2
from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'html.parser') # parse page as html

temp_table = soup.find_all('table', {'class':'cidadeTempo'}) # get detail of table with class name cidadeTempo
for entity in temp_table:
    city_name = entity.find('h3').text # fetches name of city
    city_temp_max = entity.find('span', {'class':'tempMax'}).text # fetches max temperature
    city_temp_min = entity.find('span', {'class':'tempMin'}).text # fetches min temperature
    print("City :{} \t Max_temp: {} \t Min_temp: {}".format(city_name, city_temp_max, city_temp_min)) # prints content

below code can get details of temprature at right side of page as you require.

result_table = soup.find('div', {'class':'ca-content-wrapper'})
print(result_table.text) # in your case there is no other div exist with class name ca-content-wrapper hence I can use it directly without iterating. you can use if condition to control which city temprature to print and which to not.
    # output will be like :
        # Apucarana

        # 21.5°C
        # 4 km/h
        # UR60%
        # 0.0 mm
Gahan
  • 4,075
  • 4
  • 24
  • 44
  • Oh, and `city_name` is in wrong encoding - you'll probably want to replace it with `bytes(city_name, encoding='latin1').decode('utf-8')` when printing. – Błotosmętek Jun 30 '17 at 13:57
  • You code works fine, but it gets to weather prediction of min/max temps for the day. I need the current values (15min delay). If you see the www.simepar.br, it's the right part of the site. The city is Londrina. – grissom Jun 30 '17 at 14:05
0

I'm not sure what problems you are running into with your code. In my attempts to use your code, I found that I needed to use the html parser to successfully parse the website. I also used soup.findAll() in order to find elements that matched the desired class. Hopefully the below will lead you to your answer:

from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'html.parser')

rows = soup.findAll('li', {'class', 'ca-cond-firs'})
print rows
Frumples
  • 425
  • 1
  • 4
  • 18
0

Here you go. You can customize that wind thing depending on icon name.

#!/usr/bin/env python
# -*- encoding: utf8 -*-
import sys

reload(sys)
sys.setdefaultencoding('utf-8')

from bs4 import BeautifulSoup
import requests

def get_weather_data():

    URL = 'http://www.simepar.br/site/index.shtml'

    rawhtml = requests.get(URL).text
    soup = BeautifulSoup(rawhtml, 'html.parser')

    cities = soup.find('div', {"class":"ca-content-wrapper"})

    weather_data = []

    for city in cities.findAll("div", {"class":"ca-bg"}):

        name = city.find("div", {"class":"ca-cidade"}).text
        temp = city.find("li", {"class":"ca-cond-firs"}).text

        conditons = city.findAll("li", {"class":"ca-cond"})

        weather_data.append({
            "city":name,
            "temp":temp,
            "conditions":[{
                "wind":conditons[0].text +" "+what_wind(conditons[0].find("img")["src"]),
                "humidity":conditons[1].text,
                "raind":conditons[2].text,
            }]
        })


    return weather_data

def what_wind(img):
    if img.find ("NE"):
        return "From North East"

    if img.find ("O"):
        return "From West"

    if img.find ("N"):
        return "From North"

    #you can add other icons here


print get_weather_data()

And that is all weather data from that website.

pregmatch
  • 2,629
  • 6
  • 31
  • 68
  • nice...but I got this error: reload(sys) NameError: name 'reload' is not defined – grissom Jun 30 '17 at 14:07
  • what version of python are you using? try this: https://stackoverflow.com/questions/10142764/nameerror-name-reload-is-not-defined. I have put that so you do not get errors on not UTF8 characters. – pregmatch Jun 30 '17 at 14:20
  • i am on python 2.7. this is answer for you on that reload: https://stackoverflow.com/a/961219/1108279 – pregmatch Jun 30 '17 at 14:23
0

You should try out the CSS3 selectors in BS4, I personally find it a lot easier to use than find and find_all.

from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'lxml')

# soup.select returns the list of all the elements that matches the CSS3 selector

# get the text inside each <a> tag inside div.ca-cidade
cities = [cityTag.text for cityTag in soup.select("div.ca-cidade > a")] 

# get the temperature inside each li.ca-cond-firs
temps = [tempTag.text for tempTag in soup.select("li.ca-cond-firs")]

# get the temperature status inside each li.ca-cond-firs > img title attibute
tempStatus = [tag["title"] for tag in soup.select("li.ca-cond-firs > img")]

# len(cities) == len(temps) == len(tempStatus) => This is normally true.

for i in range(len(cities)):
    print("City: {}, Temperature: {}, Status: {}.".format(cities[i], temps[i], tempStatus[i]))
Xelvoz
  • 313
  • 1
  • 7
  • AMAZING! works like a cham! Can you tell me what should I do to print only one city (if I wanted to) isolated? – grissom Jun 30 '17 at 14:12
  • You can try this out, if you know that the city you seek exists in the cities list: `print("City: {}, Temperature: {}, Status: {}.".format(cities[cities.index("Londrina")], temps[cities.index("Londrina")], tempStatus[cities.index("Londrina")]))` – Xelvoz Jun 30 '17 at 14:23