0

Now I am using beautifulsoup to handle the html. when I use replace_with(),it returns this result.It escapes my '<' and '>'.

>>> tt = bs('<p><a></a></p>')

>>> bb = tt.p

>>> tt

<html><body><p><a></a></p></body></html>

>>> bb

<p><a></a></p>

>>> bb.replace_with('<p>aaaaaaa<aaaaa></p>')

<p><a></a></p>

>>> tt

<html><body>&lt;p&gt;aaaaaaa&lt;aaaaa&gt;&lt;/p&gt;</body></html>

I want tt output like this:

>>> tt

<html><body><p>aaaaaaa<aaaaa></p></body></html>

what should I do ? 3Q
---------update--------------------------
here,I am writing a program with python,which is used to transter your html blog to markdown.Its code is here. My main approach is:
1 use urllib2 to crawl a page code
2 use beautifulSoup to parse the dom tree
3 use beautifulSoup to modify the exisit dom tree(here I use bs.replace_with)
4 save the modified dom tree to a markdown file

the problem is that beautifulSoup will autoescape '<' and '>' when I am modifying the dom tree.It means that the dom tree was modified not as I expected. The html is

 service tool->SQL Server Reporting Services

The markdown is

 service tool-&gt;SQL Server Reporting Services
geqianst
  • 3
  • 2
  • 1
    Have you looked at [this post](http://stackoverflow.com/questions/9939248/how-to-prevent-django-basic-inlines-from-autoescaping)? – kdopen Dec 18 '14 at 17:53

1 Answers1

0
from bs4 import BeautifulSoup
tt = BeautifulSoup('<p><a></a></p>')

new = BeautifulSoup('<p>aaaaaaa<aaaaa></p>')
tt.p.replace_with(new.p)

Using your own code you can use an output formatter to see the output you want:

from bs4 import BeautifulSoup
tt = BeautifulSoup('<p><a></a></p>')
tt.p.replace_with('<p>aaaaaaa<aaaaa></p>')
print(tt.prettify(formatter=None))
<html>
 <body>
  <p>aaaaaaa<aaaaa></p>
 </body>
</html>

You can also replace the string inside the tags but I am not fully sure what you want to achieve exactly but the documentation is pretty clear and understandable.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321