2

This code should write some text to file. When I'm trying to write my text to console, everything works. But when I try to write the text into the file, I get UnicodeEncodeError. I know, that this is a common problem which can be solved using proper decode or encode, but I tried it and still getting the same UnicodeEncodeError. What am I doing wrong?

I've attached an example.

print "(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)".decode("utf-8")%(dict.get('name'),dict.get('description'),dict.get('ico'),dict.get('city'),dict.get('ulCislo'),dict.get('psc'),dict.get('weby'),dict.get('telefony'),dict.get('mobily'),dict.get('faxy'),dict.get('emaily'),dict.get('dic'),dict.get('ic_dph'),dict.get('kategorie')[0],dict.get('kategorie')[1],dict.get('kategorie')[2])

(StarBuy s.r.o.,Inzertujte s foto, auto-moto, oblečenie, reality, prácu, zvieratá, starožitnosti, dovolenky, nábytok, všetko pre deti, obuv, stroj....

with open("test.txt","wb") as f:
   f.write("(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)".decode("utf-8")%(dict.get('name'),dict.get('description'),dict.get('ico'),dict.get('city'),dict.get('ulCislo'),dict.get('psc'),dict.get('weby'),dict.get('telefony'),dict.get('mobily'),dict.get('faxy'),dict.get('emaily'),dict.get('dic'),dict.get('ic_dph'),dict.get('kategorie')[0],dict.get('kategorie')[1],dict.get('kategorie')[2]))

UnicodeEncodeError: 'ascii' codec can't encode character u'\u010d' in position 50: ordinal not in range(128)

Where could be the problem?

Milano
  • 18,048
  • 37
  • 153
  • 353

4 Answers4

5

To write Unicode text to a file, you could use io.open() function:

#!/usr/bin/env python
from io import open

with open('utf8.txt', 'w', encoding='utf-8') as file:
    file.write(u'\u010d')

It is default on Python 3.

Note: you should not use the binary file mode ('b') if you want to write text.

# coding: utf8 that defines the source code encoding has nothing to do with it.

If you see sys.setdefaultencoding() outside of site.py or Python tests; assume the code is broken.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
1

@ned-batchelder is right. You have to declare that the system default encoding is "utf-8". The coding comment # -*- coding: utf-8 -*- doesn't do this.

To declare the system default encoding, you have to import the module sys, and call sys.setdefaultencoding('utf-8'). However, sys was previously imported by the system and its setdefaultencoding method was removed. So you have to reload it before you call the method.

So, you will need to add the following codes at the beginning:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Cosmo
  • 836
  • 1
  • 12
  • 27
0

You may need to explicitly declare that python use UTF-8 encoding.

The answer to this SO question explains how to do that: Declaring Encoding in Python

Community
  • 1
  • 1
SW_user2953243
  • 334
  • 1
  • 12
  • I have it already declared: # -*- coding: utf-8 -*- on the top of my code – Milano Jul 12 '14 at 22:51
  • The coding comment on the .py file only affects how the .py source is decoded. It has nothing to do with the way data is decoded. – Ned Batchelder Jul 12 '14 at 22:53
  • Then maybe you should ensure the strings in the dict are UTF-8 encoded it before trying to decode them as such. You've only shown us a line of code - which isn't enough for me or probably anyone else to work with if it's a semantic issue. – SW_user2953243 Jul 12 '14 at 23:37
  • Do you want to write a unicode string or ASCII string to the file? To write a unicode string, you probably should replace `decode('utf-8')` with `encode('utf-8')`. But I'm assuming the strings you are retrieving from the Dict are ASCII encoded. If that is the case, then you don't need the `.decode('utf-8')` at all. – SW_user2953243 Jul 13 '14 at 00:22
  • @SW_user2953243 I'm getting data from web page which has "utf-8" in head. I want to write these data into the file. And it says that 'ordinal not in range(128)' – Milano Jul 13 '14 at 09:53
  • You probably need to parse the the header until you get to the actual UTF-8 payload. Then pass the payload to Stream.Parse(). I don't think the header is UTF-8 encoded. – SW_user2953243 Jul 13 '14 at 12:58
0

For Python 2:

  1. Declare document encoding on top of the file (if not done yet):

    # -*- coding: utf-8 -*-

  2. Replace .decode with .encode:

    with open("test.txt","wb") as f:
        f.write("(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)".encode("utf-8")%(dict.get('name'),dict.get('description'),dict.get('ico'),dict.get('city'),dict.get('ulCislo'),dict.get('psc'),dict.get('weby'),dict.get('telefony'),dict.get('mobily'),dict.get('faxy'),dict.get('emaily'),dict.get('dic'),dict.get('ic_dph'),dict.get('kategorie')[0],dict.get('kategorie')[1],dict.get('kategorie')[2]))
    
ellockie
  • 3,730
  • 6
  • 42
  • 44