You can scrape Google Search Description Website using BeautifulSoup
web scraping library.
More about what are CSS selectors are, and cons of using CSS selectors.
Check code in online IDE.
from bs4 import BeautifulSoup
import requests, lxml, json
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
# this URL params is taken from the actual Google search URL
# and transformed to a more readable format
params = {
"q": "python web scrape google", # query
"gl": "us", # country to search from
"hl": "en", # language
}
html = requests.get("https://www.google.com/search", headers=headers, params=params, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
website_description_data = []
for result in soup.select(".tF2Cxc"):
website_name = result.select_one(".yuRUbf a")["href"]
description = result.select_one(".lEBKkf").text
website_description_data.append({
"website_name" : website_name,
"description" : description
})
print(json.dumps(website_description_data, indent=2))
Example output
[
{
"website_name": "https://practicaldatascience.co.uk/data-science/how-to-scrape-google-search-results-using-python",
"description": "Mar 13, 2021 \u2014 First, we're using urllib.parse.quote_plus() to URL encode our search query. This will add + characters where spaces sit and ensure that the\u00a0..."
}
]
[
{
"website_name": "https://practicaldatascience.co.uk/data-science/how-to-scrape-google-search-results-using-python",
"description": "Mar 13, 2021 \u2014 First, we're using urllib.parse.quote_plus() to URL encode our search query. This will add + characters where spaces sit and ensure that the\u00a0..."
},
{
"website_name": "https://stackoverflow.com/questions/38619478/google-search-web-scraping-with-python",
"description": "You can always directly scrape Google results. To do this, you can use the URL https://google.com/search?q=<Query> this will return the top\u00a0..."
}
# ...
]