Now I am using beautifulsoup to handle the html. when I use replace_with(),it returns this result.It escapes my '<' and '>'.
>>> tt = bs('<p><a></a></p>')
>>> bb = tt.p
>>> tt
<html><body><p><a></a></p></body></html>
>>> bb
<p><a></a></p>
>>> bb.replace_with('<p>aaaaaaa<aaaaa></p>')
<p><a></a></p>
>>> tt
<html><body><p>aaaaaaa<aaaaa></p></body></html>
I want tt output like this:
>>> tt
<html><body><p>aaaaaaa<aaaaa></p></body></html>
what should I do ?
3Q
---------update--------------------------
here,I am writing a program with python,which is used to transter your html blog to markdown.Its code is here.
My main approach is:
1 use urllib2 to crawl a page code
2 use beautifulSoup to parse the dom tree
3 use beautifulSoup to modify the exisit dom tree(here I use bs.replace_with)
4 save the modified dom tree to a markdown file
the problem is that beautifulSoup will autoescape '<' and '>' when I am modifying the dom tree.It means that the dom tree was modified not as I expected. The html is
service tool->SQL Server Reporting Services
The markdown is
service tool->SQL Server Reporting Services