-1

I want to create a program that uses BeautifulSoup to retrieve current weather data from the Google search results page. I've tried finding the correct html element through the .select() method by its class and id(.wob_t, #wob_tm), but it shows that these dont't exist. How can I fix this?

def search():
pagedownload = requests.get('https://www.google.com/search?q=' + city + '+' + 'weather')
pagedownload.raise_for_status()
pagehtml = bs4.BeautifulSoup(pagedownload.text, 'html.parser')
htmlline = pagehtml.select('.wob_t')
print (len(htmlline))

The output:

0

2 Answers2

0

Google take various steps to make it difficult for bots to scrape directly from their site, making development of a stable scraper relying on Google difficult. In addition, this is likely to fall outside the terms of usage for their site.

They do, however, provide a number of APIs to directly access data from them. Unfortunately, their weather API is depreciated (See: google api to get weather infromation)

I would consider using an authorised API (from the link above) to get weather data.

moo
  • 1,597
  • 1
  • 14
  • 29
0

Have a look at SelectorGadget Chrome extension to grab CSS selectors by clicking on the desired element in your browser.

I assume you want to extract wind speed/direction forecast data. If so, to do that, you need to select a container first, then iterate over it and extract the .wob_t selector:

for wind_speed_direction in soup.select('.wob_noe .wob_hw'):
    wind_speed = wind_speed_direction.select_one('.wob_t').text
    wind_direction = ' '.join(wind_speed_direction.select_one('.wob_t')['aria-label'].split(' ')[2:4])
    print(f"Wind Speed: {wind_speed}\nWind Direction: {wind_direction}\n")

-------
'''
Wind Speed: 3 mph
Wind Direction: From west

Wind Speed: 2 mph
Wind Direction: From southwest
'''

Also, make sure you're using HTTP header user-agent, otherwise, Google will block requests eventually:

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

requests.get('YOUR_URL', headers=headers)

Code and full example in the online IDE:

from bs4 import BeautifulSoup
import requests, lxml

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "phagwara weather",
  "hl": "en",
  "gl": "us"
}

response = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

weather_condition = soup.select_one('#wob_dc').text
tempature = soup.select_one('#wob_tm').text
precipitation = soup.select_one('#wob_pp').text
humidity = soup.select_one('#wob_hm').text
wind = soup.select_one('#wob_ws').text
current_time = soup.select_one('#wob_dts').text

print(f'Weather condition: {weather_condition}\n'
      f'Temperature: {tempature}°F\n'
      f'Precipitation: {precipitation}\n'
      f'Humidity: {humidity}\n'
      f'Wind speed: {wind}\n'
      f'Current time: {current_time}\n')

-----------
'''
Weather condition: Mostly cloudy
Temperature: 88°F
Precipitation: 24%
Humidity: 71%
Wind speed: 2 mph
Current time: Thursday 12:00 PM
'''

Alternatively, you can do it as well by using Google Direct Answer Box API from SerpApi. It's a paid API with a free plan.

The difference in your case is that you don't need to figure out how to get the right element or bypass blocks from Google since it's already done for the end-user. The only things that need to be done are to iterate over structured JSON and get the data you want.

Code to integrate:

from serpapi import GoogleSearch
import os

params = {
  "engine": "google",
  "q": "phagwara weather",
  "api_key": os.getenv("API_KEY"),
  "hl": "en",
  "gl": "us",
}

search = GoogleSearch(params)
results = search.get_dict()

loc = results['answer_box']['location']
weather_date = results['answer_box']['date']
weather = results['answer_box']['weather']
temp = results['answer_box']['temperature']
precipitation = results['answer_box']['precipitation']
humidity = results['answer_box']['humidity']
wind = results['answer_box']['wind']

print(f'{loc}\n{weather_date}\n{weather}\n{temp}°F\n{precipitation}\n{humidity}\n{wind}\n')

---------
'''
Phagwara, Punjab, India
Thursday 12:00 PM
Mostly cloudy
88°F
24%
70%
2 mph
'''

Disclaimer, I work for SerpApi.

Dmitriy Zub
  • 1,398
  • 8
  • 35