0

I not just want the content of a page but also the formatting associated to each heading in my final document and not just text without highlighting the headings. e.g. formatting all headings bold.

so far I extract just the text of my div conatiner conating all headings and paragraphhs:

from bs4 import BeautifulSoup


soup = BeautifulSoup(page.content, 'html.parser')


t=soup.find_all('div',class_=['x'])

    df=[]
    for i in t:
        for head in i:
            df.append(i.get_text())

what i need now, is that the text is extracted fo the heading an then the following paragraph subsequently, such that I can format the headings. So i would to iterate through all heading, extract heading and afterwards the paragraph text....

Martin Gergov
  • 1,556
  • 4
  • 20
  • 29
ctiid
  • 335
  • 1
  • 3
  • 14

0 Answers0