1

I'm writing a script in python that generates output that contains utf-8 characters, and even though most linux terminals use utf-8 by default, I'm writing the code presuming it isn't in utf-8 (in case the user changed it, for some reason).

From what I tested, os.environ["LANG"] = "en_US.utf-8" does not change the system environment variable, it only changes in the data structure inside Python.

maviz
  • 2,392
  • 1
  • 11
  • 11
  • 1
    Why won't you just `.encode(sys.stdout.encoding)` your Unicode output? Otherwise, see [`man 5 locale`](https://linux.die.net/man/5/locale); basically you need to et an environment variable and then run your program. – 9000 Mar 05 '17 at 01:38
  • Actually using `LANG=en_US.utf-8` solves it, but for some reason I can't use it with `os.system("LANG=en_US.utf-8")`, `subprocess.call("LANG=en_US.utf-8", shell=True)` or `subprocess.Popen("LANG=en_US.utf-8", shell=True)` – maviz Mar 05 '17 at 02:50
  • This is because it's not an executable! [Pass `env` to `Popen` instead](http://stackoverflow.com/a/26643847/223424). – 9000 Mar 05 '17 at 15:26
  • You're right. `LANG` is actually an environment variable, which value I'm trying to change from within a Python script. I tried using `env` as a parameter to `Popen()`. It returns `0`. I rephrased the question and its details to clarify what I need. – maviz Mar 06 '17 at 01:29
  • 1
    ``subprocess.call("export LANG=en_US.utf-8", shell=True)`` will start a child process, set the environment for the child process, and then kill the child process. It will have no effect - you can't change the environment from a child process. – Penguin Brian Mar 06 '17 at 04:42

2 Answers2

0

I think you're overdoing it. Python comes with batteries included; just use them.

A correctly configured terminal session has the LANG environment variable set; it describes which encoding the terminal expects as output from programs running in this session.

Python interpreter detects this setting and sets sys.stdout.encoding according to it. It then uses that encoding to encode any Unicode output into a correct byte sequence. (If you're sending a byte sequence, you're on your own, and likely know what you're doing; maybe you're sending a binary stream, not text at all.)

So, if you output your text as Unicode, it must appear correctly automatically, provided that all the characters can be encoded.

If you need a finer control, pick the output encoding, encode with your own error handling, and output the bytes.

You're not in a business of changing the terminal session's settings, unless you're writing a tool specifically to do that. The user has configured the session; your program has to adapt to this configuration, not alter it, if it's a well-behaved program.

9000
  • 39,899
  • 9
  • 66
  • 104
0

It is not clear what you want to see happen when you change the LANG environment. If you want to test your Python code with other character encodings, you will need to set LANG before starting the Python code, as I believe LANG is read when Python first starts.

There might(?) be a function call you can call to change the LANG after Python has started, however if this is for testing purposes I recommend setting it before running the Python code.

An even better approach however would be to change the LANG in your terminal program. So that it has the correct encoding. Although almost everyone should be using UTF8, so I am not really sure you need to test non-UTF8 anymore.

Penguin Brian
  • 1,991
  • 14
  • 25