To scrape just summary you can use select_one()
method provided by bs4
by selecting CSS
selector. You can use the SelectorGadget Chrome extension or any other to make a quick selection.
Make sure you're using a user-agent
, otherwise, Google could block your request because the default user-agent
will be python-requests (if you were using requests
library)
List of user-agents to fake user visit.
From there you can scrape every other part you want by using select_one()
method. Keep in mind that you can scrape info from Knowladge graph only if Google provides it. You can make an if
or try-except
statement to handle exceptions.
Code and full example:
from bs4 import BeautifulSoup
import requests
import lxml
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
html = requests.get('https://www.google.com/search?q=who is donald trump', headers=headers).text
soup = BeautifulSoup(html, 'lxml')
summary = soup.select_one('.Uo8X3b+ span').text
print(summary)
Output:
Donald John Trump is an American media personality and businessman who served as the 45th president of the United States from 2017 to 2021.
Born and raised in Queens, New York City, Trump attended Fordham University and the University of Pennsylvania, graduating with a bachelor's degree in 1968.
An alternative way to do it using Google Knowledge Graph API from SerpApi. It's a paid API with a free plan. Check out playground to see if it suits your needs.
Example code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "who is donald trump",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
summary = results["knowledge_graph"]['description']
print(summary)
Output:
Donald John Trump is an American media personality and businessman who served as the 45th president of the United States from 2017 to 2021.
Born and raised in Queens, New York City, Trump attended Fordham University and the University of Pennsylvania, graduating with a bachelor's degree in 1968.
Disclaimer I work for SerpApi.