1

I was bitten by http://bugs.python.org/issue1681974 - quoting from there:

mkdtemp fails on Windows if Windows user name has any non-ASCII characters, like ä or ö, in it. mkdtemp throws an encoding error. This seems to be because the default temp dir in Windows is "c:\documents and settings\<user name>\local settings\temp"

The workaround the OP used is:

try: # workaround for http://bugs.python.org/issue1681974
    return tempfile.mkdtemp(prefix=prefix)
except UnicodeDecodeError:
    tempdir = unicode(tempfile.gettempdir(), 'mbcs')
    return tempfile.mkdtemp(prefix=prefix, dir=tempdir)

I have 2 questions:

  1. Why this should work ?
  2. How full proof is this ? From a similar questions (see this answer: Python Popen failing to use proper encoding in Windows PowerShell) I got the notion that I maybe should use sys.stdout.encoding - am I anywhere near the mark ?

Edit: actually the line:

print u"input encoding: %s; output encoding: %s; locale: %s" % (
    sys.stdin.encoding,getattr(sys.stdout,'encoding',None),
    locale.getdefaultlocale())

prints

input encoding: None; output encoding: None; locale: ('ja_JP', 'cp932')

so maybe I should go for locale.getpreferredencoding() (see for instance subprocess.Popen with a unicode path)

Edit2: in the comments it is suggested I encode the prefix in mbcs - unfortunately this is not an option as the codebase expects unicode everywhere and will blow sooner or later. The code posted is a simplified fragment.

Edit3: my little workaround apparently did not workaround anything - will try:

fsenc = sys.getfilesystemencoding() or 'mbcs'
return tempfile.mkdtemp(prefix=prefix.encode(fsenc)).decode(fsenc)

if there's any non ascii user left to test that is.

Meanwhile - the reproducers below don't work for me:

C:\_\Python27\python.exe -u C:\__\JetBrains\PyCharm 3.4.1\helpers\pydev\pydevconsole.py 18324 18325
PyDev console: starting.import sys; print('Python %s on %s' % (sys.version, sys.platform))
Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win32
sys.path.extend(['C:\\Dropbox\\eclipse_workspaces\\python\\wrye-bash'])
>>> d = u'ελληνικα'.encode(sys.getfilesystemencoding()); os.environ['TEMP'] = os.path.abspath(d)
>>> import tempfile; tempfile.mkdtemp(prefix=u'x')
u'c:\\users\\mrd\\appdata\\local\\temp\\xtf3nav'

and variations...

edit4 - the directory exists in an absolute sense:

>>> d = u'ελληνικα'.encode(sys.getfilesystemencoding()); os.path.abspath(d)
'C:\\Dropbox\\eclipse_workspaces\\python\\wrye-bash\\e??????a'
>>> assert os.path.isdir(os.path.abspath(d))
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AssertionError
>>> d = u'ελληνικα'
>>> os.path.abspath(d)
u'C:\\Dropbox\\eclipse_workspaces\\python\\wrye-bash\\\u03b5\u03bb\u03bb\u03b7\u03bd\u03b9\u03ba\u03b1'
>>> assert os.path.isdir(os.path.abspath(d))
>>> 
Community
  • 1
  • 1
Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
  • @eryksun: they discuss this in the bug - the problem is the username - how full proof would mbsc be - would it work for the japanese user alright ? – Mr_and_Mrs_D Jan 23 '15 at 01:24
  • No, it's not merely the username, because `os.path.join` won't raise a `UnicodeDecodeError` unless you're mixing a non-ASCII `str` with `unicode`. That happens in this case if `prefix` is `unicode`. – Eryk Sun Jan 23 '15 at 01:29
  • `'mbcs'` is implemented using Windows `MultiByteToWideChar` (decode) and `WideCharToMultiByte` (encode) for the `CP_ACP` encoding (the system ANSI encoding), which is what `locale.getpreferredencoding()` returns. That said, Python's codec for `'cp932'` is probably more strict than Windows when it comes to undefined characters. I know that's the case for `'cp1252'`, which fails for `'\x81'.decode('cp1252')`, while `'\x81'.decode('mbcs') == u'\x81'`. – Eryk Sun Jan 23 '15 at 01:38
  • the issue is marked "out of date". If you see the failure on Python 2.7; please, leave the message on the bug tracker. – jfs Jan 23 '15 at 13:43
  • @J.F.Sebastian, to replicate the `UnicodeDecodeError`, set `os.environ['TEMP']` to an existing path that contains a non-ASCII character and then pass a `unicode` string for `prefix`. – Eryk Sun Jan 23 '15 at 15:19
  • @eryksun: thanks for the reproducer. I edited the question to clarify that using a str prefix is not an option. – Mr_and_Mrs_D Jan 23 '15 at 15:35
  • have you tried `return tempfile.mkdtemp(prefix=prefix.encode('mbcs') if isinstance(prefix, unicode) else prefix).decode('mbcs')`? (for portability, you could use `sys.getfilesystemencoding()` here). On Python 3, just pass the unicode string as is. – jfs Jan 23 '15 at 15:35
  • @J.F.Sebastian: the code posted is simplified - I need the return value to be in unicode. So I would need to reencode that ? Would `tempdir = unicode(tempfile.gettempdir(), sys.getfilesystemencoding()) return tempfile.mkdtemp(prefix=prefix, dir=tempdir)` be the answer (without messing with the prefix) ? – Mr_and_Mrs_D Jan 23 '15 at 15:40
  • the code in my comment returns `unicode`. It is all the code there is: no try/except. – jfs Jan 23 '15 at 15:45
  • @eryksun: I can reproduce it even on Linux in Python 2: `import os, sys; os.environ['TEMP'] = d = u'\N{SNOWMAN}'.encode(sys.getfilesystemencoding()); assert os.path.isdir(d)`. Then `import tempfile; tempfile.mkdtemp(prefix=u'x')` raises `UnicodeDecodeError` on `path += '/' + b` line in `posixpath.py`. It is the same error as: `path = u'\N{SNOWMAN}'.encode('utf-8'); path += '/' + u'x'` – jfs Jan 23 '15 at 16:02
  • @J.F.Sebastian: right - the prefix will always be unicode in my case - would then `return tempfile.mkdtemp(prefix=prefix.encode(sys.getfilesystemencoding())).decode(sys.getfilesystemencoding())` be the answer ? - if yes post it please – Mr_and_Mrs_D Jan 23 '15 at 18:40
  • @Mr_and_Mrs_D: unless `isinstance()` call is a performance bottleneck in your application (I doubt it); leave it. In principle, `sys.getfilesystemencoding()` may be `None` in some cases before Python 3.2 (it is `None` in Jython) -- test it. If you tested and the solution works for you; you can [post it as your own answer](http://stackoverflow.com/help/self-answer) – jfs Jan 23 '15 at 19:44
  • I was (kindly) requested to fill a bug report in (http://bugs.python.org/issue1681974) - I really do not qualify to make it - I would accept as an answer a link to a bug report. The prefix is under my control - so that'd be `fsenc = sys.getfilesystemencoding() or 'mbcs'; return tempfile.mkdtemp(prefix=prefix.encode(fsenc)).decode(fsenc)` ? I do post answers to my questions it's just that testing is not trivial in my setup – Mr_and_Mrs_D Jan 23 '15 at 22:45
  • @Mr_and_Mrs_D: `assert os.path.isdir(d)` is not optional. If `d` is not a directory; `tempfile` won't use it. – jfs Jan 25 '15 at 10:39
  • @J.F.Sebastian: see edit4 – Mr_and_Mrs_D Jan 25 '15 at 11:46
  • No. The directory does not exist. The system considers the paths to be *different* in this case. Call `os.mkdir(d)` to make sure it exists – jfs Jan 25 '15 at 11:53
  • You mean: `>>> os.environ['TEMP'] = d = u'ελληνικα'.encode(sys.getfilesystemencoding()) >>> os.mkdir(d) Traceback (most recent call last): File "", line 1, in WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'e??????a'` - I had tried this of course (it's in the "variations" clause) – Mr_and_Mrs_D Jan 25 '15 at 14:39
  • @J.F.Sebastian: And my favorite variation: `>>> import os, sys; os.environ['TEMP'] = d = u'ελληνικα'; os.path.abspath(d) Traceback (most recent call last): File "", line 1, in File "C:\_\Python27\lib\os.py", line 422, in __setitem__ putenv(key, item) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)`. As far as I am concerned all these tracebacks are bugs. Anyway let's hear from the user - if ever again – Mr_and_Mrs_D Jan 25 '15 at 14:41

1 Answers1

1

I finally went with:

sys_fs_enc = sys.getfilesystemencoding() or 'mbcs'

@staticmethod
def tempDir(prefix=None):
    try: # workaround for http://bugs.python.org/issue1681974 see there
        return tempfile.mkdtemp(prefix=prefix)
    except UnicodeDecodeError:
        try:
            traceback.print_exc()
            print 'Trying to pass temp dir in...'
            tempdir = unicode(tempfile.gettempdir(), sys_fs_enc)
            return tempfile.mkdtemp(prefix=prefix, dir=tempdir)
        except UnicodeDecodeError:
            try:
                traceback.print_exc()
                print 'Trying to encode temp dir prefix...'
                return tempfile.mkdtemp(
                    prefix=prefix.encode(sys_fs_enc)).decode(sys_fs_enc)
            except:
                traceback.print_exc()
                print 'Failed to create tmp dir, Bash will not function ' \
                      'correctly.'

Apparently the first try catch is sufficient but I left the tracebacks in so I can get some more input ;)

Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361