6

Some of my application's libraries are depending on being able to print UTF-8 characters to stdout and stderr. Therefore this must not fail:

print('\u2122')

On my local machine it works, but on my remote server it raises UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in position 0: ordinal not in range(128)

I tried $ PYTHONIOENCODING=utf8 with no apparent effect.

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

works for a while, then stalls and finally fails with ValueError: underlying buffer has been detached

sys.getdefaultencoding() returns 'utf-8', and sys.stdout.encoding returns 'ANSI_X3.4-1968'

What can I do? I don't want to edit third-party libraries.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
Mirac7
  • 1,566
  • 4
  • 26
  • 44
  • I think that [this link](http://stackoverflow.com/questions/2276200/changing-default-encoding-of-python) is going to help you. – Samuel PS Oct 18 '16 at 17:04
  • 4
    Side-note: Even if everything else worked, `PYTHONIOENCODING=utf8` won't work unless you `export` it (or prefix the Python launch with it). Otherwise, it's a local variable in `bash` that isn't inherited in the environment of child processes. `export PYTHONIOENCODING=utf-8` would both set and export it in `bash`. – ShadowRanger Oct 18 '16 at 17:08
  • 2
    @SamuelPS: The suggestion of the top answer there is... suboptimal. [Forcibly reloading `sys` to regain access to `setdefaultencoding` can cause problems](http://stackoverflow.com/a/3828742/364696), and in any event, the correct solution on modern Python (>=3.3) is to make sure your system is using a broadly useful full Unicode supporting default encoding globally. Anything else means you're using hacks to output characters the OS officially doesn't even recognize, and dependent on it playing along despite it claiming it won't work. – ShadowRanger Oct 18 '16 at 17:22

2 Answers2

5

From @ShadowRanger's comment on my question,

PYTHONIOENCODING=utf8 won't work unless you export it (or prefix the Python launch with it). Otherwise, it's a local variable in bash that isn't inherited in the environment of child processes. export PYTHONIOENCODING=utf-8 would both set and export it in bash.

export PYTHONIOENCODING=utf-8 did the trick, UTF-8 characters no longer raise UnicodeEncodeError

Mirac7
  • 1,566
  • 4
  • 26
  • 44
0

I'm guessing you're on a UNIX-like system, and your environment set LANG (or LC_ALL or whatever) to C.

Try editing your default shell's startup file to set LANG to something like en_US.utf-8 (or whatever locale makes sense for you)? For example, in bash, edit ~/.bash_profile (or ~/.profile if you're using that instead for sh compatibility) and add:

export LANG="en_US.utf-8"

For (t)csh, edit ~/.cshrc (or ~/.tcshrc if that's what you're using) to add:

setenv LANG "en_US.utf-8"

Making the changes "live" doesn't work, because your shell is likely hosted in a terminal that has configured itself solely for ASCII display, based on the LANG=C in effect when it was launched (and many terminals do session coalescence, so even if you changed LANG and then launched a new terminal, it would coalesce with the shared terminal process with the out-of-date LANG). So after you change ~/.bash_profile, log out and then log back in so your root shell will set LANG correctly for every other process (since they all ultimately fork from the root shell).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • `echo $LANG` is returning `en_US.UTF-8` by default – Mirac7 Oct 18 '16 at 17:46
  • @Mirac7: That doesn't necessarily mean it was set that way at login. If you set it in your `.bashrc` for instance (and per normal setup, `.bashrc` isn't run for login shells), you'd see it in your terminals, but the terminals themselves wouldn't see it. – ShadowRanger Oct 18 '16 at 20:38