0

Have run smack into a problem with subprocess.open() when running a batch file with unicode characters in the path name. This barfs in 2.6 and 2.7 but works perfectly in 3.2. Was it really just a bug that lasted all the way until py3k??

# -*- coding: utf-8 -*-

o = u"C:\\temp\\test.bat"        #"control" case
q = u"C:\\temp\\こんにちは.bat"

ho = open(o, 'r')
hq = open(q, 'r')               #so we can open q

ho.close()
hq.close()

import subprocess
subprocess.call(o)              #batch runs
subprocess.call(q)              #nothing from here on down runs
subprocess.call(q, shell=True)
subprocess.call(q.encode('utf8'), shell=True)   
subprocess.call(q.encode('mbcs'), shell=True)  #this was suggested elsewhere for older windows
jambox
  • 584
  • 4
  • 15
  • BTW there are a number of near-duplicates, but I believe this is slightly different from all of the ones I've looked at. – jambox Mar 08 '12 at 12:11
  • 2
    possible duplicate of [Unicode filename to python subprocess.call()](http://stackoverflow.com/questions/2595448/unicode-filename-to-python-subprocess-call) – Ferdinand Beyer Mar 08 '12 at 12:21
  • 1
    How is this question any different? The `subprocess` module has troubles with unicode strings in version 2.x. Since 3.0, all strings are unicode and the problem went away. – Ferdinand Beyer Mar 08 '12 at 12:23
  • OK your'e right, it seems like quite a famous bug. Maybe I just couldn't bring myself to believe it! – jambox Mar 08 '12 at 12:34

1 Answers1

2

Filenames are passed to and returned from APIs as (Unicode) strings. This can present platform-specific problems because on some platforms filenames are arbitrary byte strings. (On the other hand, on Windows filenames are natively stored as Unicode.) As a work-around, most APIs (e.g. open() and many functions in the os module) that take filenames accept bytes objects as well as strings, and a few APIs have a way to ask for a bytes return value. Thus, os.listdir() returns a list of bytes instances if the argument is a bytes instance, and os.getcwdb() returns the current working directory as a bytes instance. Note that when os.listdir() returns a list of strings, filenames that cannot be decoded properly are omitted rather than raising UnicodeError.

From the whats new in 3.0 page.

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
  • Thanks. I should have looked in the 3.0 release notes, but I couldn't find any reference to this in the 2.7 docs. – jambox Mar 08 '12 at 12:35