I'm working currently on web scraping and I need to extract a description of a city in a google search result.
Let's say that I want a description of Madrid city, I searched and got the following result:
This is the source code for the target div:
<div jscontroller="GCSbhd" class="kno-rdesc" jsaction="seM7Qe:c0XUbe;Iigoee:c0XUbe;rcuQ6b:npT2md">
<h3 class="Uo8X3b OhScic zsYMMe">Description</h3>
<span>Située au centre de l'Espagne, Madrid, sa capitale, est une ville dotée d'élégants boulevards et de vastes parcs très bien entretenus comme le Retiro. Elle est réputée pour ses riches collections d'œuvres d'art européennes, avec notamment celles du musée du Prado, réalisées par Goya, Velázquez et d'autres maîtres espagnols. Au cœur de la vieille Madrid des Habsbourgs se trouve la Plaza Mayor, bordée de portiques, et, à proximité, le Palais royal baroque et son Armurerie, qui comporte des armes historiques.
<span>
<span class="eHaQD"> ― Google
</span>
</span>
</span>
</div>
I tried scraping the content and selecting the <h3>
tag and then select its sibling but the result is None
, this is the code used:
import requests
from bs4 import BeautifulSoup
url_PresMadrid = "https://www.google.com/search?q=madrid"
req_PresPadrid = requests.get(url_PresMadrid)
soup_PresMadrid = BeautifulSoup(req_PresPadrid.content, 'html.parser')
target_div_PresMadrid = soup_PresMadrid.find('h3', {'class': 'Uo8X3b OhScic zsYMMe'})
print(target_div_PresMadrid)
I even tried to select the only parent <div>
that doesn't change its class but the code returns None
as well, this the code for it:
import requests
from bs4 import BeautifulSoup
url_PresMadrid = "https://www.google.com/search?q=madrid"
req_PresPadrid = requests.get(url_PresMadrid)
soup_PresMadrid = BeautifulSoup(req_PresPadrid.content, 'html.parser')
target_div_PresMadrid = soup_PresMadrid.find('div', {'class': 'liYKde g VjDLd'})
print(target_div_PresMadrid)
Can anyone help me to understand the mechanics of the search engine so that I can extract that paragraph