1

i am trying to create a function that is to open the URL and construct an outline from the HTML code.The outline should include text between any ...tags.basically just to create an outline from a specific web page.Each heading level should also be properly numbered, with heading hx having x levels of numbering. How to start?

jhg6699
  • 9
  • 2

1 Answers1

0

There are a lot of tags in the html you have linked, besides headings. Anyway this is to get you started:

You need the packages beautifulsoup4 and requests for this. Python comes with inbuilt packages for these oprations, however the above 2 packages make the job extremely easy.

import requests
from bs4 import BeautifulSoup

html = requests.get("http://homepage.cs.uiowa.edu/~lillis/016/2014Summer/assignments/HW12/jazz.html").text
#If you want to parse another url, change the link within get()
soup = BeautifulSoup(html, "lxml")
print soup.body

This will print all the tags, texts and other contents within the body tag of the html. If you want a different output or something more specific put a comment below. I'll change the code.

user2963623
  • 2,267
  • 1
  • 14
  • 25