- Default Google search address start doesn't contain
#
symbol. Instead, it should have ?
and /search
pathname:
---> https://google.com/#q=
---> https://www.google.com/search?q=cake
- Make sure you're passing
user-agent
into HTTP request headers because the default requests
user-agent
is python-requests
and sites could identify that it's a bot and block the request thus you would receive a different HTML with some sort of an error that contains different elements/selectors which is the reason you were getting an empty result.
Check what's your user-agent
, and a list of user-agents
for mobile, tablets, etc.
Pass user-agent
in request headers
:
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}
requests.get('YOUR_URL', headers=headers)
Code and example in the online IDE:
from bs4 import BeautifulSoup
import requests, json, lxml
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}
# https://requests.readthedocs.io/en/latest/user/quickstart/#passing-parameters-in-urls
params = {
'q': 'tesla', # query
'gl': 'us', # country to search from
'hl': 'en' # language
}
# https://requests.readthedocs.io/en/latest/user/quickstart/#timeouts
html = requests.get("https://www.google.com/search", headers=headers, params=params, timeout=30)
soup = BeautifulSoup(html.text, 'lxml')
data = []
for result in soup.select('.tF2Cxc'):
title = result.select_one('.DKV0Md').text
link = result.select_one('.yuRUbf a')['href']
# sometimes there's no description and we need to handle this exception
try:
snippet = result.select_one('#rso .lyLwlc').text
except: snippet = None
data.append({
'title': title,
'link': link,
'snippet': snippet
})
print(json.dumps(data, indent=2, ensure_ascii=False))
-------------
'''
[
{
"title": "Tesla: Electric Cars, Solar & Clean Energy",
"link": "https://www.tesla.com/",
"snippet": "Tesla is accelerating the world's transition to sustainable energy with electric cars, solar and integrated renewable energy solutions for homes and ..."
},
{
"title": "Tesla, Inc. - Wikipedia",
"link": "https://en.wikipedia.org/wiki/Tesla,_Inc.",
"snippet": "Tesla, Inc. is an American electric vehicle and clean energy company based in Palo Alto, California, United States. Tesla designs and manufactures electric ..."
},
{
"title": "Nikola Tesla - Wikipedia",
"link": "https://en.wikipedia.org/wiki/Nikola_Tesla",
"snippet": "Nikola Tesla was a Serbian-American inventor, electrical engineer, mechanical engineer, and futurist best known for his contributions to the design of the ..."
}
]
'''
Alternatively, you can achieve the same thing by using Google Organic Results API from SerpApi. It's a paid API with a free plan just to test the API.
The difference in your case is that you don't have to figure out why the output is empty and what causes this to happen, bypass blocks from Google or other search engines, and maintain the parser over time.
Instead, you only need to grab the data from the structured JSON you want.
Example code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google", # serpapi parsing engine
"q": "tesla", # search query
"hl": "en", # language of the search
"gl": "us", # country from where search initiated
"api_key": os.getenv("API_KEY") # your serpapi API key
}
search = GoogleSearch(params) # data extraction on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
for result in results["organic_results"]:
print(f"Title: {result['title']}\nSummary: {result['snippet']}\nLink: {result['link']}\n")
----------
'''
Title: Tesla: Electric Cars, Solar & Clean Energy
Summary: Tesla is accelerating the world's transition to sustainable energy with electric cars, solar and integrated renewable energy solutions for homes and ...
Link: https://www.tesla.com/
Title: Tesla, Inc. - Wikipedia
Summary: Tesla, Inc. is an American electric vehicle and clean energy company based in Palo Alto, California, United States. Tesla designs and manufactures electric ...
Link: https://en.wikipedia.org/wiki/Tesla,_Inc.
'''
Disclaimer, I work for SerpApi.