I am trying to get the articlebody parts of the html. I can get the components of the script tag but not the "articleBody" part of this tag. Please find my code below:
import requests
from bs4 import BeautifulSoup as bs
from lxml import etree
url_req="https://www.bbc.com/news/live/world-europe-61792068"
response=requests.get(url=url_req,verify=True)
soup=bs(response.text, "lxml")
soup = soup.encode('ascii', 'ignore').decode('ascii')
with open('file.xml', 'w') as f:
f.write(soup)
with open("file.xml") as fp:
soup = bs(fp,"lxml")
df=soup.find_all("script")
Below is the result that I am hoping to get: There are many articleBody parts under the script tags. I want to get only article body parts output after running the code -without any other parts of the script tags.
For example:
"articleBody":"Russia's aggression in Ukraine is a game-changer, Nato Secretary General Jens Stoltenberg has said. Stoltenberg has been speaking in Brussels where defence ministers from member countries of the military alliance and a handful of other allies have been meeting to discuss the situation in Ukraine. He says progress has been made in many areas and, in a meeting with the Ukrainian defence minister last night, they discussed the "imperative need for our continued support as Russia conducts a relentless war of attrition against Ukraine". Stoltenberg says Ukraine's allies have announced additional assistance, "including much-needed heavy weapons and long range systems" and also discussed plans to support the country for the longer term and to step up Nato's "presence,