How to properly run python script with UTF-16BE encoding?

Question

I have a source file test.py encoded in UTF-16BE:

# coding=UTF-16BE

print "test utf-16"

When I run the following command in my bash:

python test.py

Nothing printed out in my terminal. Why is that? How should I deal with it? Is it dependent on my bash's default encoding?

score 0 · Answer 1 · answered Apr 12 '18 at 06:48

I keep searching around and finally get myself clarified by bobince's answer in a similar question. Because I am using UTF-16BE as the encoding scheme for my python source code, the magic comment:

# coding=UTF-16BE

is also encoded in UTF-16BE. However, because UTF-16BE is not ASCII-compatible, python cannot detect the encoding correctly by reading the comment in ascii. As a result, the script can not be properly run.

score 0 · Answer 2 · answered Apr 12 '18 at 08:54

0

You should use something like:

#!/usr/bin/python
# -*- coding: utf-16be -*-

on first or second line (the important bit is coding, : or = (so other answer are ok, if you put on top) and the codec. See PEP 263 for the syntax.

You should check that you do not have BOM at beginning (BOM is allowed on generic UTF-16, but not when the endianess is specified). Editors get it often wrong.

But in general I would recommend to use UTF-8 as encoding for code: it is much better supported by editors, and it is the default for Python3. Both UTF-8 and UTF-16 are just encoding of Unicode, so the support should be the same. Note: really Python2 will use UTF16 like encoding internally (UCS2), Python3 dynamically (per string) select UFT-8, UTF-16 or UTF-32. but forget about internal, this is a question of editor.

Note: The source encoding do no matter for executing (run time) the code. The default encoding to read and write files and to stdout are independent of code, they just depend on OS and environment.

answered Apr 12 '18 at 08:54

Giacomo Catenazzi

8,519
2
24
32

Thanks for your answer. Have you tried it yourself? The two lines you mentioned don't work for me.. – VeryLazyBoy Apr 12 '18 at 09:00
1

Now yes, and strangely with UTF16 it gives me errors (before to parse encoding), on the other UTF16-XE I have nothing. So it seems that python do no work with multibyte encoding (`\0` will probably stop source). I'm looking for official references. – Giacomo Catenazzi Apr 12 '18 at 09:17
Yeah, there must be something wrong with the `\0` which is produced by UTF16-BE. – VeryLazyBoy Apr 12 '18 at 09:30
Comments do no work for UTF16 (PEP link, Section "Concept", point 1, second paragraph), so all answers (also my) are incorrect. Parser allow to change coding (not sure with multibyte), but I do no find the logic how to set it (per file/module) [in case of multibytes]. – Giacomo Catenazzi Apr 12 '18 at 10:05

How to properly run python script with UTF-16BE encoding?

2 Answers2