0

good morning. I'm trying to do this and not leave me .

Can you help me?

thank you very much

 soup = BeautifulSoup(html_page)
           titulo=soup.find('h3').get_text()
      titulo=titulo.replace('§','')

 titulo=titulo.replace('§','')
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:       ordinal not in range(128)

2 Answers2

3

Define the coding and operate with unicode strings:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

html_page = u"<h3>§ title here</h3>"

soup = BeautifulSoup(html_page, "html.parser")

titulo = soup.find('h3').get_text()
titulo = titulo.replace(u'§', '')
print(titulo)

Prints title here.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
0

I'll explain you clearly what's the problem:

By default Python does not recognize particular characters like "à" or "ò". To make Python recognize those characters you have to put at the top of your script:

# -*- coding: utf-8 -*-

This codes makes Python recognize particular characters that by default are not recognized. Another method to use the coding is using "sys" library:

# sys.setdefaultencoding() does not exist, here!
import sys
reload(sys)  #This reloads the sys module
sys.setdefaultencoding('UTF8') #Here you choose the encoding
dreamwhite
  • 116
  • 8