1

I'm trying to redirect the output of a python script into a new file, using the Cmder/ConEmu Tool on Windows 10. The output will be containing non-ascii characters, and I need the encoding of the new file to be UTF-8

Simplified, my script looks like this:

print('Bärlauch')

Then I use this command to redirect its output to a new file.

λ python example_script.py > new_file.txt

If I then check the properties of the file, its encoding is iso-8859-1, Latin-1.

λ file -i new_file.txt
new_file.txt: text/plain; charset=iso-8859-1

For further processing and practicability, I need it to be UTF-8. I haven't found a solution up to this point, and I've been looking for quite some time. Is it impossible to change the encoding the redirect-operator uses?

EDIT: I set the codepage to UTF-8 before running the command that creates the new file, but the encoding remains Latin-1.

λ chcp 65001
backendboi
  • 131
  • 1
  • 6
  • Just checking - is the output of the python script definitely encoded as UTF-8? – snakecharmerb Mar 21 '19 at 19:15
  • Yes it is. Before printing, I decode from a bytestream with utf8 specified. But isn't this irrelevant after I've printed it to stdout? – backendboi Mar 21 '19 at 19:21
  • I think you're right, unless redirection counts as a "non-character device" [Windows stdout](https://docs.python.org/3/library/sys.html#sys.stdout): "... UTF-8 is used for the console device. Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage)." – snakecharmerb Mar 21 '19 at 19:38
  • I recommend you not to use `print`, by `write`, so you have control on the encoding. The problem with `print`: it try to use terminal encoding (if printed directly) or system encoding (if passed on a pipe). Both encoding could be different, and so you may get problems. [OTOH you should not use UTF-8 if printing on a console which doesn't support UTF-8] – Giacomo Catenazzi Mar 22 '19 at 10:20

0 Answers0