hi § symbol unrecognized

Question

good morning. I'm trying to do this and not leave me .

Can you help me?

thank you very much

 soup = BeautifulSoup(html_page)
           titulo=soup.find('h3').get_text()
      titulo=titulo.replace('§','')

 titulo=titulo.replace('§','')
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:       ordinal not in range(128)

What's the text of `h3`? – Jossie Calderon Jul 24 '16 at 17:14 — Jossie Calderon, Jul 24 '16 at 17:14

score 3 · Answer 1 · edited May 23 '17 at 12:22

3

Define the coding and operate with unicode strings:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

html_page = u"<h3>§ title here</h3>"

soup = BeautifulSoup(html_page, "html.parser")

titulo = soup.find('h3').get_text()
titulo = titulo.replace(u'§', '')
print(titulo)

Prints title here.

edited May 23 '17 at 12:22

Community

1
1

answered Jul 24 '16 at 17:17

alecxe

462,703
120
1,088
1,195

tkns i only need "u" Tks. – Damian Perez Jul 24 '16 at 18:53

score 0 · Answer 2 · answered Jul 24 '16 at 18:05

I'll explain you clearly what's the problem:

By default Python does not recognize particular characters like "à" or "ò". To make Python recognize those characters you have to put at the top of your script:

# -*- coding: utf-8 -*-

This codes makes Python recognize particular characters that by default are not recognized. Another method to use the coding is using "sys" library:

# sys.setdefaultencoding() does not exist, here!
import sys
reload(sys)  #This reloads the sys module
sys.setdefaultencoding('UTF8') #Here you choose the encoding

hi § symbol unrecognized

2 Answers2