2

How should I write "mąka" in Python without an exception?

I've tried var= u"mąka" and var= unicode("mąka") etc... nothing helps

I have coding definition in first line in my document, and still I've got that exception:

'utf8' codec can't decode byte 0xb1 in position 0: unexpected code byte

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
Driego
  • 21
  • 1

4 Answers4

4

Save the following 2 lines into write_mako.py:

# -*- encoding: utf-8 -*-
open(u"mąka.txt", 'w').write("mąka\n")

Run:

$ python write_mako.py

mąka.txt file that contains the word mąka should be created in the current directory.

If it doesn't work then you can use chardet to detect actual encoding of the file (see chardet example usage):

import chardet

print chardet.detect(open('write_mako.py', 'rb').read())

In my case it prints:

{'confidence': 0.75249999999999995, 'encoding': 'utf-8'}
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Desperate times and all that. – Paul D. Waite Dec 22 '09 at 22:39
  • @John: yes, the OP problem is most probably that the source file encoding doesn't match the '`-*- encoding: '` line's one. – jfs Dec 22 '09 at 22:39
  • @J.F. Sebastian: Most probably, but IMHO telling an OP to import an unfamiliar 3rd party package for a simple debug job is like telling him to get a cannon to kill a mosquito. If he were to show us the results of `print repr(open("my_tiny_script.py", "rb).read())` we'd be able to sort him out very soon. It would also help if he'd tell us which editor he's using on what OS. – John Machin Dec 23 '09 at 02:05
2

The # -- coding: -- line must specify the encoding the source file is saved in. This error message:

'utf8' codec can't decode byte 0xb1 in position 0: unexpected code byte

indicates you aren't saving the source file in UTF-8. You can save your source file in any encoding that supports the characters you are using in the source code, just make sure you know what it is and have an appropriate coding line.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • you're probably right. Driego should try replacing utf-8 to the `sys.getdefaultencoding()` value – mykhal Dec 22 '09 at 21:55
1

What exception are you getting?

You might try saving your source code file as UTF-8, and putting this at the top of the file:

# coding=utf-8

That tells Python that the file’s saved as UTF-8.

Paul D. Waite
  • 96,640
  • 56
  • 199
  • 270
  • I have: # -*- coding: utf-8 -*- Is it makes any difference? But, when I've changed it, still nothing happened... – Driego Dec 22 '09 at 17:59
  • This needs to be the first or the second line in the file, per PEP 0263 (http://www.python.org/dev/peps/pep-0263/). Also, if you still get an exception, please specify which exception it is so it's easier to try and help. – Michał Marczyk Dec 22 '09 at 18:03
1

This code works for me, saving the file as UTF-8:

v = u"mąka"
print repr(v)

The output I get is:

u'm\u0105ka'

Please copy and paste the exact error you are getting. If you are getting this error:

UnicodeEncodeError: 'charmap' codec can't encode character ... in position ...: character maps to <undefined>

Then you are trying to output the character somewhere that does not support UTF-8 (e.g. your shell's character encoding is set to something other than UTF-8).

Steven Kryskalla
  • 14,179
  • 2
  • 40
  • 42