How to solve encoding error in Python

Question

I want to scrape some contents from a webpage, this is the code:

import requests
from bs4 import BeautifulSoup
import urllib2
url = "anUrl"
r = requests.get(url)
soup = BeautifulSoup(r.text,'lxml')
print soup.prettify()

This is the error description: unicodeencodeerror: 'charmap' codec can't encode character u'\u2013' in position :character maps to undefined

This kind of error should depends about different characters, not ever the same, so i need a generic solution.

What are you using for a console, i.e. where is the `print` output going? — Mark Ransom, Oct 15 '15 at 15:14
I'm printing it on command line, but i need to display it on a browser. — Poggio, Oct 15 '15 at 15:15
But is it Windows, Linux, or something else? And if you put it on a browser you won't be using `print` anymore, correct? — Mark Ransom, Oct 15 '15 at 15:17
Windows. Yes, i'm trying with some test in command line, then i will change the output. — Poggio, Oct 15 '15 at 15:20

score 2 · Answer 1 · edited May 23 '17 at 12:14

2

I think you have the same problem : UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 3 2: ordinal not in range(128)

So you can use u'\u2013'.encode('utf8') :) (to be more specific, use soup.prettify().encode('utf8'))

Or switch to Python 3 ;)

edited May 23 '17 at 12:14

Community

1
1

answered Oct 15 '15 at 15:17

Labo

2,482
2
18
38

I've still watched at that answer, i'm forced to use Python 2.*, but i don know where to put u'\u2013'.encode('utf8') in my code. – Poggio Oct 15 '15 at 15:23
should be `r.text.encode('utf8')` or `r.content.encode('utf8')` i don't know where exactly you get the error – EsseTi Oct 15 '15 at 15:25
1

You don't say exactly where you're getting your error, but from your description it sounds like you might need to properly encode the pretty soup going out to the terminal with: `print soup.prettify().encode('utf8')`. – xnx Oct 15 '15 at 15:36

score 1 · Accepted Answer · answered Oct 15 '15 at 15:46

To fix the print command, you can explicitly encode the output. You have many different choices depending on how you want to treat Unicode characters.

If you simply want to eliminate any characters that aren't supported by your console:

print soup.prettify().encode(sys.stdout.encoding, 'ignore')

If you want to replace characters that aren't supported with a placeholder character (typically a question mark):

print soup.prettify().encode(sys.stdout.encoding, 'replace')

If you want to show any non-ASCII characters as an escape sequence:

print soup.prettify().encode('raw_unicode_escape')

When you're ready to write to HTML output, you should encode it consistently to the encoding that your web page will use, preferably UTF-8.

f.write(soup.prettify().encode('utf-8'))

Do you know how to print in browser the py script output trough javascript? In a previous python script i've used this: print "Content-type: text\n\n" but in that case i was not using BeautifulSoup, so now i'm not able to pass an useful object to the js script. — Poggio, Oct 16 '15 at 15:22
@Poggio sorry, I haven't yet used Python to output a web page so it's outside of my area of expertise. — Mark Ransom, Oct 16 '15 at 15:34

How to solve encoding error in Python

2 Answers2