UnicodeEncodeError when writing to a file

Question

I am trying to write some strings to a file (the strings have been given to me by the HTML parser BeautifulSoup).

I can use "print" to display them, but when I use file.write() I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 6: ordinal not in range(128)

How can I parse this?

yossi · Answer 1 · 2011-08-04T10:37:29.393

16

This error occurs when you pass a Unicode string containing non-English characters (Unicode characters beyond 128) to something that expects an ASCII bytestring. The default encoding for a Python bytestring is ASCII, "which handles exactly 128 (English) characters". This is why trying to convert Unicode characters beyond 128 produces the error.

The unicode()

unicode(string[, encoding, errors])

constructor has the signature unicode(string[, encoding, errors]). All of its arguments should be 8-bit strings.

The first argument is converted to Unicode using the specified encoding; if you leave off the encoding argument, the ASCII encoding is used for the conversion, so characters greater than 127 will be treated as errors

for example

s = u'La Pe\xf1a' 
print s.encode('latin-1')

or

write(s.encode('latin-1'))

will encode using latin-1

edited Aug 04 '11 at 10:37

answered Aug 04 '11 at 10:24

yossi

12,945
28
84
110

The string it's outputting is a price like "£123" – Ivy Aug 04 '11 at 10:25
which is not valid ASCII. The pound sign is char code 163, outside of the ASCII range of 127. – Daniel Roseman Aug 04 '11 at 10:28
You must specify an encoding that can encode those characters. Files do not contain characters; they contain bytes. Encodings convert characters to bytes. – Karl Knechtel Aug 04 '11 at 10:29
2

Yes, when I say "you must do this" I understand perfectly that you aren't doing it yet. That's why you must do it: to fix the problem you describe. `write()` doesn't "understand Unicode" because (a) files do not contain characters, but bytes; and (b) there **is more than one way to do the encoding** and there is no particularly good way for it to choose on your behalf. Well, actually, it does: it picks the simplest possible encoding, that only handles the few character that everyone agrees upon, so that an error comes up if anything special is required. – Karl Knechtel Aug 04 '11 at 11:08

score 2 · Answer 2 · answered Aug 04 '11 at 10:39

The answer to your question is "use codecs". The appeded code also shows some gettext magic, FWIW. http://wiki.wxpython.org/Internationalization

import codecs

import gettext

localedir = './locale'
langid = wx.LANGUAGE_DEFAULT # use OS default; or use LANGUAGE_JAPANESE, etc.
domain = "MyApp"             
mylocale = wx.Locale(langid)
mylocale.AddCatalogLookupPathPrefix(localedir)
mylocale.AddCatalog(domain)

translater = gettext.translation(domain, localedir, 
                                 [mylocale.GetCanonicalName()], fallback = True)
translater.install(unicode = True)

# translater.install() installs the gettext _() translater function into our namespace...

msg = _("A message that gettext will translate, probably putting Unicode in here")

# use codecs.open() to convert Unicode strings to UTF8

Logfile = codecs.open(logfile_name, 'w', encoding='utf-8')

Logfile.write(msg + '\n')

Despite Google being full of hits on this problem, I found it rather hard to find this simple solution (it is actually in the Python docs about Unicode, but rather burried).

So ... HTH...

GaJ

"Simple"? That's also showing a bunch of i18n machinery that OP doesn't care about - he's not trying to make sure that people see text in the right language, he's trying to grab text in a specific language from a specific source and put it in a file. So the only relevant part of your snipped is the first line and the last two, really. As for "hard to find", really? What did you Google for? I tried `UnicodeEncodeError: 'ascii' codec can't encode character`; the results seem helpful enough... — Karl Knechtel, Aug 04 '11 at 11:13

score 1 · Accepted Answer · edited Mar 01 '23 at 10:16

1

I tried this it works fine

with open(r"C:\rag\sampleoutput.txt", 'w', encoding="utf-8") as f:

edited Mar 01 '23 at 10:16

Ivy

3,393
11
33
46

answered Feb 28 '23 at 08:32

Raghavasimhan Sankarambadi Ram

26
2

UnicodeEncodeError when writing to a file

3 Answers3

Linked