building an outline from a webpage,python

Question

i am trying to create a function that is to open the URL and construct an outline from the HTML code.The outline should include text between any ...tags.basically just to create an outline from a specific web page.Each heading level should also be properly numbered, with heading hx having x levels of numbering. How to start?

It would be more clear if you can give an example of an html and corresponding output — user2963623, Jul 27 '14 at 05:09
http://stackoverflow.com/questions/11709079/parsing-html-python — Raghav RV, Jul 27 '14 at 06:39
@user2963623 basically i am creating an outline from this http://homepage.cs.uiowa.edu/~lillis/016/2014Summer/assignments/HW12/jazz.html — jhg6699, Jul 27 '14 at 18:02

user2963623 · Answer 1 · 2014-07-28T06:37:23.323

0

There are a lot of tags in the html you have linked, besides headings. Anyway this is to get you started:

You need the packages beautifulsoup4 and requests for this. Python comes with inbuilt packages for these oprations, however the above 2 packages make the job extremely easy.

import requests
from bs4 import BeautifulSoup

html = requests.get("http://homepage.cs.uiowa.edu/~lillis/016/2014Summer/assignments/HW12/jazz.html").text
#If you want to parse another url, change the link within get()
soup = BeautifulSoup(html, "lxml")
print soup.body

This will print all the tags, texts and other contents within the body tag of the html. If you want a different output or something more specific put a comment below. I'll change the code.

edited Jul 28 '14 at 06:37

answered Jul 28 '14 at 06:31

user2963623

2,267
1
14
25

is there another way of constructing the outline using and tags? – jhg6699 Jul 28 '14 at 14:32
So you only want the heading tags? – user2963623 Jul 28 '14 at 14:41
yes only the headings. and when you read the lines for the headings, are they sting or list? – jhg6699 Jul 30 '14 at 23:57

building an outline from a webpage,python

1 Answers1