1

According to python Unicode-HOWTO, the default encoding will not be ASCII if I set LAND environment variable.

I have python2.7 on Mountain Lion, the $LANG environment variable is "en_US.UTF-8". Running "sys.getfilesystemencoding()" returns "utf-8", but running "sys.getdefaultencoding()" returns "ascii".

When I run the following:

struct.pack('12s',u'filename\u4500abc')

it failed with:

TypeError: Struct() argument 1 must be string, not unicode

Explicitly change to

struct.pack('12s',u'filename\u4500abc'.encode('utf-8'))

worked.

Question is what is the difference between "sys.getdefaultencoding" and "sys.getfilesystemencoding"? It seems the first is related with "struct.pack", and what is the second for? And how to make 'utf-8' as default encoding when doing "struct.pack"?

Hailiang Zhang
  • 17,604
  • 23
  • 71
  • 117
  • "Still"? python 2.x is discontinued. – wRAR Aug 17 '13 at 19:14
  • @wRAR clearly OP means "even though I set LANG", not "in this brand new 2.7 version". – Wooble Aug 17 '13 at 19:16
  • 1
    I don't understand what ASCII has to do with the code you're showing here. There's no encoding problem; you're just using the wrong type object as an argument. – Wooble Aug 17 '13 at 19:20

1 Answers1

0

Short answer:

I think $LANG is the OS's usage, not python - look long answer if you want more details.

Long answer:

The default of Python 2.x is to use ascii. You can change it(can't remember how) but it is not recommended, because it will break libraries that use ascii strings. It is all changed in 3.x. where UTF is the standard. Can't wait to Python 3.x become the standard!

You can read about Unicode in Unicode in Python, great presentation which really helped me.

How to set default! But warning! This is how you set the default encoding. But don't use it, it will break libraries, and result in more pain than just encode and decode:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Smarties89
  • 561
  • 3
  • 10
  • 1
    Thanks! But "sys.setdefaultencoding('utf-8')" returns "AttributeError: 'module' object has no attribute 'setdefaultencoding'" on my machine – Hailiang Zhang Aug 17 '13 at 19:09
  • I edited the answer, you apparently have to reload it, more details on http://stackoverflow.com/questions/11741574/how-to-set-the-default-encoding-to-utf-8-in-python – Smarties89 Aug 17 '13 at 19:15
  • 1
    @HailiangZhang "This function is only intended to be used by the site module implementation and, where needed, by sitecustomize. Once used by the site module, it is removed from the sys module’s namespace." – wRAR Aug 17 '13 at 19:15
  • Yes, so you should really think about if this is what you want. It is generally very bad. When you get used to encode decode it is not that big deal. – Smarties89 Aug 17 '13 at 19:17
  • @Martinb Thanks! "reload(sys)" did allow "defaultencoding" to be changed. However, the above "struct.pack" still failed with default. – Hailiang Zhang Aug 17 '13 at 19:22
  • 3
    Quick google, I think I found that struct uses ascii as default, and it can't be changed. I even found some who suggested python 3.x uses ascii as default, which is kinda unexcepting since py3.x should be all unicode. So encode, decode is the only solution - which is the best way anyway :) – Smarties89 Aug 17 '13 at 19:34