2

I am reading a string from file:

a = '<script>closedSign: \'<img src="/static/images/drop-down.png" style="margin-top: -3px;"  />\'</script>'

Now, when I run

BeautifulSoup(a)

<script>closedSign: '&lt;img src="/static/images/drop-down.png" style="margin-top: -3px;"   /&gt;'</script>

Thus, <img is being HTML escaped into &lt;img

How can I avoid this?

Jamal
  • 287
  • 1
  • 4
  • 15

2 Answers2

4

Use BeautifulSoup 3.2.0 instead of 3.2.1 to fix this problem.

Jamal
  • 287
  • 1
  • 4
  • 15
  • Fixed the problem for me too. Thanks a lot for the solution and -1 for BeautifulSoup breaking such a thing after a minor update. – Ponytech Oct 21 '13 at 19:17
2

Look at the "Entity Conversion" section of the Beautiful Soup Documentation.

soup = BeautifulSoup(html, convertEntities=BeautifulSoup.HTML_ENTITIES)
Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
  • Paulo, convertEntities=BeautifulSoup.HTML_ENTITIES works on strings which are already HTML escaped. Example: to return < img from parsing & lt;img But here the problem is, parsing itself is returning a HTML Escaped string. I hope I was able to clearly explain my problem. – Jamal Nov 09 '12 at 13:52
  • What happens if you run BeautifulSoup a second time over the script tag HTML? – Paulo Scardine Nov 09 '12 at 16:02
  • 3
    For possile `bs4` users: `convertEntities` [doesn't](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#entities) exist anymore in bs4. – arash Dec 04 '20 at 12:31