1

The problem

from bs4 import BeautifulSoup
a=BeautifulSoup('<p class="t5">&#x20b9; 10,000 or $ 133.46</p>')
b=open('file.html','w')
b.write(str(a))

The result is

UnicodeEncodeError: 'charmap' codec can't encode character '\u20b9' in position 19038: character maps to <undefined> This is the problem because of this &#x20b9; and it does not occur when we change the bs4 object to str but it occurs when we write it inside a file.

What have I tried

  1. Convert HTML entities into Unicode string
  2. How to convert a bs4.element.ResultSet to strings? Python
  3. Convert an amount to Indian Notation in Python
  4. How do I unescape HTML entities in a string in Python 3.1?

What can be the solution

Converting a BeautifulSoup object into a string without changing the & #x20b9; sign into ₹ ( Which by the way str() method does ). And then saving the string into a file.
Kumar Saptam
  • 336
  • 5
  • 18

1 Answers1

2

Use encoding='utf-8' on file

Ex:

from bs4 import BeautifulSoup

a=BeautifulSoup('<p class="t5">&#x20b9; 10,000 or $ 133.46</p>')

with open(filename,'w', encoding='utf-8') as infile:
    infile.write(str(a))  # OR infile.write(a.prettify())

Output:

<p class="t5">₹ 10,000 or $ 133.46</p>
Rakesh
  • 81,458
  • 17
  • 76
  • 113