i am trying to create a function that is to open the URL and construct an outline from the HTML code.The outline should include text between any ...tags.basically just to create an outline from a specific web page.Each heading level should also be properly numbered, with heading hx having x levels of numbering. How to start?
Asked
Active
Viewed 366 times
1
-
2It would be more clear if you can give an example of an html and corresponding output – user2963623 Jul 27 '14 at 05:09
-
http://stackoverflow.com/questions/11709079/parsing-html-python – Raghav RV Jul 27 '14 at 06:39
-
@user2963623 basically i am creating an outline from this http://homepage.cs.uiowa.edu/~lillis/016/2014Summer/assignments/HW12/jazz.html – jhg6699 Jul 27 '14 at 18:02
1 Answers
0
There are a lot of tags in the html
you have linked, besides headings
. Anyway this is to get you started:
You need the packages beautifulsoup4
and requests for this
. Python comes with inbuilt packages for these oprations, however the above 2 packages make the job extremely easy.
import requests
from bs4 import BeautifulSoup
html = requests.get("http://homepage.cs.uiowa.edu/~lillis/016/2014Summer/assignments/HW12/jazz.html").text
#If you want to parse another url, change the link within get()
soup = BeautifulSoup(html, "lxml")
print soup.body
This will print all the tags
, texts
and other contents within the body
tag of the html
. If you want a different output or something more specific put a comment below. I'll change the code.

user2963623
- 2,267
- 1
- 14
- 25
-
-
-
yes only the headings. and when you read the lines for the headings, are they sting or list? – jhg6699 Jul 30 '14 at 23:57