Have a look at SelectorGadget Chrome extension to grab CSS
selectors by clicking on the desired element in your browser.
I assume you want to extract wind speed/direction forecast data. If so, to do that, you need to select a container first, then iterate over it and extract the .wob_t
selector:
for wind_speed_direction in soup.select('.wob_noe .wob_hw'):
wind_speed = wind_speed_direction.select_one('.wob_t').text
wind_direction = ' '.join(wind_speed_direction.select_one('.wob_t')['aria-label'].split(' ')[2:4])
print(f"Wind Speed: {wind_speed}\nWind Direction: {wind_direction}\n")
-------
'''
Wind Speed: 3 mph
Wind Direction: From west
Wind Speed: 2 mph
Wind Direction: From southwest
'''
Also, make sure you're using HTTP header user-agent
, otherwise, Google will block requests eventually:
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
requests.get('YOUR_URL', headers=headers)
Code and full example in the online IDE:
from bs4 import BeautifulSoup
import requests, lxml
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
"q": "phagwara weather",
"hl": "en",
"gl": "us"
}
response = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')
weather_condition = soup.select_one('#wob_dc').text
tempature = soup.select_one('#wob_tm').text
precipitation = soup.select_one('#wob_pp').text
humidity = soup.select_one('#wob_hm').text
wind = soup.select_one('#wob_ws').text
current_time = soup.select_one('#wob_dts').text
print(f'Weather condition: {weather_condition}\n'
f'Temperature: {tempature}°F\n'
f'Precipitation: {precipitation}\n'
f'Humidity: {humidity}\n'
f'Wind speed: {wind}\n'
f'Current time: {current_time}\n')
-----------
'''
Weather condition: Mostly cloudy
Temperature: 88°F
Precipitation: 24%
Humidity: 71%
Wind speed: 2 mph
Current time: Thursday 12:00 PM
'''
Alternatively, you can do it as well by using Google Direct Answer Box API from SerpApi. It's a paid API with a free plan.
The difference in your case is that you don't need to figure out how to get the right element or bypass blocks from Google since it's already done for the end-user. The only things that need to be done are to iterate over structured JSON and get the data you want.
Code to integrate:
from serpapi import GoogleSearch
import os
params = {
"engine": "google",
"q": "phagwara weather",
"api_key": os.getenv("API_KEY"),
"hl": "en",
"gl": "us",
}
search = GoogleSearch(params)
results = search.get_dict()
loc = results['answer_box']['location']
weather_date = results['answer_box']['date']
weather = results['answer_box']['weather']
temp = results['answer_box']['temperature']
precipitation = results['answer_box']['precipitation']
humidity = results['answer_box']['humidity']
wind = results['answer_box']['wind']
print(f'{loc}\n{weather_date}\n{weather}\n{temp}°F\n{precipitation}\n{humidity}\n{wind}\n')
---------
'''
Phagwara, Punjab, India
Thursday 12:00 PM
Mostly cloudy
88°F
24%
70%
2 mph
'''
Disclaimer, I work for SerpApi.