You are dealing with a website
which is running JavaScript
to render it's data once the page loads, you can use the following approach which is loading the script
source of the website which containing the part which you are looking for it. Now you do have tree
and dict
, so you can do whatever with it.
import requests
from bs4 import BeautifulSoup
import json
r = requests.get("https://www.britannica.com/search?query=world+war+2")
soup = BeautifulSoup(r.text, 'html.parser')
script = soup.findAll(
"script", {'type': 'text/javascript'})[15].get_text(strip=True)
start = script.find("{")
end = script.rfind("}") + 1
data = script[start:end]
n = json.loads(data)
print(json.dumps(n, indent=4))
# print(n.keys())
# print(n["topicInfo"]["description"])
Output:
{
"toc": [
{
"id": 1,
"title": "Introduction",
"url": "/event/World-War-II"
},
{
"id": 53531,
"title": "Axis initiative and Allied reaction",
"url": "/event/World-War-II#ref53531"
},
{
"id": 53563,
"title": "The Allies\u2019 first decisive successes",
"url": "/event/World-War-II/The-Allies-first-decisive-successes"
},
{
"id": 53576,
"title": "The Allied landings in Europe and the defeat of the Axis powers",
"url": "/event/World-War-II/The-Allied-landings-in-Europe-and-the-defeat-of-the-Axis-powers"
}
],
"topicInfo": {
"topicId": 648813,
"imageId": 74903,
"imageUrl": "https://cdn.britannica.com/s:300x1000/26/188426-050-2AF26954/Germany-Poland-September-1-1939.jpg",
"imageAltText": "World War II",
"title": "World War II",
"identifier": "1939\u20131945",
"description": "World War II, conflict that involved virtually every part of the world during the years 1939\u201345. The principal belligerents were the Axis powers\u2014Germany, Italy, and Japan\u2014and the Allies\u2014France, Great Britain, the United States, the Soviet Union, and, to a lesser extent, China. The war was in many...",
"url": "/event/World-War-II"
}
}
Output of print(n.keys())
dict_keys(['toc', 'topicInfo'])
Output of print(n["topicInfo"]["description"])
World War II, conflict that involved virtually every part of the world during the years 1939–45. The principal belligerents were the Axis powers—Germany, Italy, and Japan—and the Allies—France, Great Britain, the United States, the Soviet Union, and, to a lesser extent, China. The war was in many...