I not just want the content of a page but also the formatting associated to each heading in my final document and not just text without highlighting the headings. e.g. formatting all headings bold.
so far I extract just the text of my div conatiner conating all headings and paragraphhs:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
t=soup.find_all('div',class_=['x'])
df=[]
for i in t:
for head in i:
df.append(i.get_text())
what i need now, is that the text is extracted fo the heading an then the following paragraph subsequently, such that I can format the headings. So i would to iterate through all heading, extract heading and afterwards the paragraph text....