I am trying to make a question bank from this website
https://www.neetprep.com/questions/851-Botany/7918-Living-World?courseId=386
I am using the following code
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import re
my_url = 'https://www.neetprep.com/questions/851-Botany/7918-Living-World?courseId=386'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("span",{"class": "-PmH"})
print(soup.prettify(containers[0]))
My output is coming as:
<span class="-PmH" id="questionUXVlc3Rpb246NzE3MQ==">
<p>
The third name in trinomial nomenclature is
</p>
<p>
(1) Species
</p>
<p>
(2) Subgenus
</p>
<p>
(3) Subspecies
</p>
<p>
(4) Ecotype
</p>
</span>
Now how do I modify the code to get just the question and the options as my output text.
For this question my output should be
The third name in trinomial nomenclature is
(1) Species
(2) Subgenus
(3) Subspecies
(4) Ecotype
Hence I want to remove the <p>
and </p>
tags from my output.