Or use BeautifulSoup
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
edit
I apparently have to give you some hint on how to read documentation.
- Open the link
- On the left there is a big menu (teal color)
- If you look carefully you will notice that the documentation is divided in multiple sections
- Stuffs
- Navigation in the tree
- Searching the tree
- Modifying the tree (got it)
- Output (got it!)
And many more things
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
Don't stop reading after the first sentence... The last one is pretty important and what's in the middle to.
In other word, you can create an empty document... let say:
soup = BeautifulSoup("<div></div>")
document = soup.div
then you read each lines of you text.. and then do that whenever you have text.
document.append(line)
if the line starts with a `*``
ul = document.new_tag('ul')
document.append(ul)
document = ul
then push all the li
on the document... and once you end up reading *
, just pop the parent so the document gets back to the div. And keep doing that... you can even do it recursively to insert ul
into ul
s.
Once you parsed everything... you can do
str(document)
or
document.prettify()
Edit
just realized that you weren't editing the html but a unformatted text.. You could try using markdown then.
http://daringfireball.net/projects/markdown/
,and whenever you stop matching, put in a
. – Guy Adini Jul 08 '12 at 14:41