Python with non-latin-1 PYTHONHOME path

Question

In my case I embedded Python into my application. When the path of my application contains a non-latin-1 character Py_Initialize calls exit(1) internally (more information later).

So I checked if can reproduce this with the standard interpreter executable.

Python-2.7.x on Windows doesn't seem to work when the path of PYTHONHOME contains a character outside of latin-1 charset. The problem is that the module site could not be found and imported. Since umlauts seem to work, what is the actual limitation here? Is just latin-1 supported? Why does it work on OSX then?

C:\Users\ъ\Python27\python.exe    // fails to start (KOI8-R)
         ^
C:\Users\ġ\Python27\python.exe    // fails to start (latin-3)
         ^
C:\Users\ä\Python27\python.exe    // works fine (latin-1)
         ^

Any ideas?

Background:

I haven't stepped through the code yet but Python 2.6 and Python 2.7 also behave differently when site is not available. Py 2.6 just prints a message, Py 2.7 rejects to start.

static void
initsite(void)
{
    PyObject *m;
    m = PyImport_ImportModule("site");
    if (m == NULL) {
        ...

        // Python 2.7 and later
        exit(1);

        // Python 2.6 and prior
        PyFile_WriteString("'import site' failed; traceback:\n", f);
    }
    ...
}

Python 2.7: https://github.com/enthought/Python-2.7.3/blob/master/Python/pythonrun.c#L725

Python 2.6: https://github.com/python-git/python/blob/master/Python/pythonrun.c#L705

Have you tried using Python 3 instead? They redid the Unicode handling, and it's much cleaner. My recommendation is actually to use 3 whenever you can, and 2 only if you have to. — A. L. Flanagan, May 25 '16 at 20:45
In Python 3 it (should) work/s, yes. I have to stick with Python 2 because this is the version we embedded in our software, this will change in the future though. — HelloWorld, May 25 '16 at 21:27
Can you elaborate on how you "embed" Python in your app? calling it from C/C++ ? what is the mechanism you use? And do you set the PYTHONHOME? if so how do you set it? As a side note the behaviour of OS FS wrt to unicode paths varies quite a bit on Windows, Mac and Linux/POSIX. And the way to deal with this in CPython 2 needs a bit of fiddling at times... Though I did wrestle with it a few times successfully — Philippe Ombredanne, May 26 '16 at 10:15
Using *Py_Initialize, ...* from the C API. I tried *PYTHOMHOME* and the corresponding C functions (*Py_SetPath*, *Py_SetPythonhome*, ...) with no success. Btw, Python 2.7 (without being embedded) doesn't work either if installed at the given paths. — HelloWorld, May 26 '16 at 15:31
MS Windows differs from OS X in that the fundamental character set is UTF-16 there. For backward code, it also provides an "ANSI" API, which uses single byte strings but which isn't able to represent the whole Unicode range. I'm pretty sure Python 2 will never be upgraded to use the fully Unicode-capable win32 API, so any hassle is futile unless you at least upgrade to Python 3. — Ulrich Eckhardt, May 27 '16 at 06:46

Serge Ballesta · Answer 1 · 2016-05-19T11:44:03.863

2

I think that the problem is that internally, Python2 processes everything as byte strings in the platform system encoding which is (in western europe) CP1252 a variant of Latin-1. So ther is no surprise that it cannot correctly process a PYTHONHOME path containing other characters

But, when I was younger, I was used to the good old 8.3 format of MS/DOS files...

I can still see (and use them) in a Windows 7 box with DIR /X in a console (CMD.EXE) window. This format only use ASCII uppercase characters and tilda (~), so it could be used as a workaround : just declare the 8.3 path in the environment variable PYTHONHOME, and start python with that 8.3 path.

BTW, it is advisable for PYTHONHOME to use a path that contains neither special characters, nore spaces. It could work, but it could cause problems with other modules

edited May 19 '16 at 11:44

answered May 19 '16 at 08:11

Serge Ballesta

143,923
11
122
252

1

I would totally agree if it would work on a russian Windows because they have the corresponding system codepage (guess its CP125**1**). But there it fails as well. – HelloWorld May 19 '16 at 08:29
Just for completeness: the console codepage is 866 for a russian OS – HelloWorld May 19 '16 at 11:19
If 8.3 names are missing, check whether they're disabled: `fsutil behavior query Disable8dot3 C:`. Note that enabling 8.3 names will only affect new files subsequently created, not existing files. You could also try using `mklink` to create an ASCII-only hard link, symbolic link, or junction. – Eryk Sun May 20 '16 at 22:56

hkBst · Answer 2 · 2016-05-27T06:39:02.400

Looking at the PyImport_ImportModule function version 2.7 gives this definition:

PyObject *
PyImport_ImportModule(const char *name)
{
    PyObject *pname;
    PyObject *result;

    pname = PyString_FromString(name);
    if (pname == NULL)
        return NULL;
    result = PyImport_Import(pname);
    Py_DECREF(pname);
    return result;
}

While looking at the PyImport_ImportModule function version 3.5 gives the same except with

pname = PyUnicode_FromString(name);

instead of

pname = PyString_FromString(name);

You can look at the code for PyString_FromString and the code for PyUnicode_FromString but it seems clear that python 2 does not use unicode and python 3 does, but I have not been able to find how/where exactly this leads to the behavior you describe.

The PyImport_Import(module_name) function (version 2.7) only uses module_name like so:

r = PyObject_CallFunction(import, "OOOOi", module_name, globals,
                          globals, silly_list, 0, NULL);

passing on the responsibility...

Just some background FYI: Python 2 does Unicode, but the Unicode handling was completely redone for Python 3. Python 2 used a "best guess" method of decoding, and if it guessed wrong, all hell would break loose. Python 3 treats Uncode strings as strings, and encoded Unicode as byte arrays, forcing you to explicitly handle conversion if necessary. — A. L. Flanagan, May 25 '16 at 20:43
I expect the issue been located somewhere in **PyImport_Import**. I guess the lookup for directories with unicode characters in their path fails. As mentioned, I haven't debugged it though. At least latin-1 is still supported at this stage. — HelloWorld, May 25 '16 at 21:28

Python with non-latin-1 PYTHONHOME path

2 Answers2