1

I'm learning how to use BeautifulSoup and I'm trying to read the weather from Google. I'm using this URL.

I'm getting a 'KeyError: "id"' error on the line:

if span.attrs["id"] == "wob_tm":

What does this mean and how can I solve this problem?

I got the same error specifying a different attribute, "class", so I thought it might have just been a problem with the term "class" but I'm still recieving the error no matter what I use

# Creates a list containing all appearences of the 'span' tag
# The weather value is located within a span tag
spans = soup.find_all("span")

for span in spans:

    if span.attrs["id"] == "wob_tm":

        print(span.content)

I expect the output to be the integer value of the weather but when I run the code I just get: "KeyError: 'id'"

Dmitriy Zub
  • 1,398
  • 8
  • 35

4 Answers4

4

Some span tags don't have that attribute at all, so they give you the error when you try and access that. You could just refine your search:

spans = soup.find_all('span', {'id': 'wob_tm'})

This would find only objects that match. You can then just print them all:

for span in spans:
    print(span.content)
Ofer Sadan
  • 11,391
  • 5
  • 38
  • 62
  • 2
    Alternatively using css selectors: `span = soup.select_one('span#wob_tm')`... it assumes the id is unique, but then things get messy if IDs are duplicated anyway... – Jon Clements Oct 22 '19 at 12:21
0

The problem that there is no 'id' key in the dictionary 'attrs'. The code below will handle this case.

spans = soup.find_all("span")
for span in spans:
    if span.attrs.get("id") == "wob_tm":
        print(span.content)
    else:
        print('not wob_tm')
balderman
  • 22,927
  • 7
  • 34
  • 52
0

Although the rest of the answers are legit, none will work in that case because the content of temperature is loaded probably using javascript so the spans you're looking won't be found. Instead you can use selenium that works fo sure. i.e.:

from selenium import webdriver

driver = webdriver.Chrome('chromedriver.exe')
driver.get("https://www.google.co.uk/search?sxsrf=ACYBGNSfZJRq-EqvQ7rSC0oFZW-FiL-S-Q%3A1571602469929&source=hp&ei=JcCsXb-ANoK4kwWgtK_4DQ&q=what%27s+the+weather+today&oq=whats+the+weather+&gs_l=psy-ab.3.0.0i10i70i256j0i10j0j0i10l3j0l3j0i10.663.2962..4144...0.0..0.82.1251.19......0....1..gws-wiz.....10..35i362i39j35i39j0i131.AWESAgn5njA")

temp = driver.find_element_by_id('wob_tm').text
print(temp)
Kostas Charitidis
  • 2,991
  • 1
  • 12
  • 23
0

Weather data is not rendered with JavaScript as Kostas Charitidis mentioned.

You don't need to specify <span> element, and more over you don't need to use find_all()/findAll()/select() since you're looking just for one element that doesn't repeat anywhere else. Use select_one() instead:

soup.select_one('#wob_tm').text
# prints temperature

You can also use try/except if you want to return None:

try:
  temperature = soup.select_one('#wob_tm').text
except: temperature = None

An if statement always costs you, it's nearly free to set up a try/except block. But when an Exception actually occurs, the cost is much higher.

The next problem that might cause that error would be no user-agent specified so Google would block your request eventually thus you'll receive a completely different HTML. I already answered about what is user-agent.


Code and full example in the online IDE:

from bs4 import BeautifulSoup
import requests, lxml

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "london weather",
  "hl": "en",
  "gl": "us"
}

response = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

weather_condition = soup.select_one('#wob_dc').text
tempature = soup.select_one('#wob_tm').text
precipitation = soup.select_one('#wob_pp').text
humidity = soup.select_one('#wob_hm').text
wind = soup.select_one('#wob_ws').text
current_time = soup.select_one('#wob_dts').text

print(f'Weather condition: {weather_condition}\n'
      f'Temperature: {tempature}°F\n'
      f'Precipitation: {precipitation}\n'
      f'Humidity: {humidity}\n'
      f'Wind speed: {wind}\n'
      f'Current time: {current_time}\n')

----
'''
Weather condition: Mostly cloudy
Temperature: 60°F
Precipitation: 3%
Humidity: 77%
Wind speed: 3 mph
Current time: Friday 7:00 AM
'''

Alternatively, you can achieve this by using the Google Direct Answer Box API from SerpApi. It's a paid API with a free plan.

The difference in your case is that you don't have to figure out how to extract elements since it's already done for the end-user and no need to maintain a parser over time. All that needs to be done is just to iterate over structured JSON and get what you were looking for.

Code to integrate:

from serpapi import GoogleSearch
import os

params = {
  "engine": "google",
  "q": "london weather",
  "api_key": os.getenv("API_KEY"),
  "hl": "en",
  "gl": "us",
}

search = GoogleSearch(params)
results = search.get_dict()

loc = results['answer_box']['location']
weather_date = results['answer_box']['date']
weather = results['answer_box']['weather']
temp = results['answer_box']['temperature']
precipitation = results['answer_box']['precipitation']
humidity = results['answer_box']['humidity']
wind = results['answer_box']['wind']

print(f'{loc}\n{weather_date}\n{weather}\n{temp}°F\n{precipitation}\n{humidity}\n{wind}\n')

-------
'''
District 3
Friday
Mostly sunny
80°F
0%
52%
5 mph
'''

Disclaimer, I work for SerpApi.

Dmitriy Zub
  • 1,398
  • 8
  • 35