9

I have a unicode filename that I would like to open. The following code:

cmd = u'cmd /c "C:\\Pok\xe9mon.mp3"'
cmd = cmd.encode('utf-8')
subprocess.Popen(cmd)

returns

>>> 'C:\Pokיmon.mp3' is not recognized as an internal or external command, operable program or batch file.

even though the file do exist. Why is this happening?

Eryk Sun
  • 33,190
  • 5
  • 92
  • 111
iTayb
  • 12,373
  • 24
  • 81
  • 135

4 Answers4

12

It looks like you're using Windows and Python 2.X. Use os.startfile:

>>> import os
>>> os.startfile(u'Pokémon.mp3')

Non-intuitively, getting the command shell to do the same thing is:

>>> import subprocess
>>> import locale
>>> subprocess.Popen(u'Pokémon.mp3'.encode(locale.getpreferredencoding()),shell=True)

On my system, the command shell (cmd.exe) encoding is cp437, but for Windows programs is cp1252. Popen wanted shell commands encoded as cp1252. This seems like a bug, and it also seems fixed in Python 3.X:

>>> import subprocess
>>> subprocess.Popen('Pokémon.mp3',shell=True)
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • Thanks! i didnt know about `os.startfile`. – iTayb Mar 31 '12 at 01:20
  • On Windows on Python 2, `Popen(u'Pokémon.mp3'.encode(encoding))` works iff `Popen(u'Pokémon.mp3'.encode('mbcs'))` works i.e., it should succeed with `cp1252` and it should fail with `cp437` in your case. Does `shell=True` change it? What are values for `sys.getfilesystemencoding()` and `locale.getpreferredencoding()`? In general, `u"é"` might be unrepresentable using `mbcs`. Python 3 uses Unicode API directly. – jfs Mar 23 '14 at 19:30
  • 1
    On windows on python 2, if you want to use unicode command line (as python 3), you can use [this workaround](http://vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue-with-subprocesss-popen/) leveraging ``ctypes`` to patch ``subprocess.Popen(..)``. – vaab Mar 16 '17 at 10:58
  • 2
    `os.startfile` works, but `u'Pokémon.mp3'.encode(locale.getpreferredencoding())` will of course fail in any locale in which the ANSI codepage doesn't map "é". In 2.x `subprocess.Popen` calls `CreateProcessA`, which decodes the command line as ANSI, so it is limited to commands that can be encoded as such. If you need a command line that can't be encoded as ANSI, then you must do something else via ctypes, cffi, or an extension module, such as call `CreateProcessW` or a CRT function such as `_wsystem`. – Eryk Sun Sep 07 '17 at 22:37
  • 1
    CMD is a Unicode application. It only uses codepages to decode bytes when working with files and pipes, such as reading a line of a batch script or a `for /f` loop that reads stdout from a command. In this case its default codepage is ANSI if it isn't attached to a console. Otherwise it uses the *console's* input or output codepage (CMD is not the console), which defaults to OEM unless changed via chcp.com. In any case, the encoding CMD uses for files is irrelevant. By the time CMD sees its command line, it's already decoded as Unicode by Windows. – Eryk Sun Sep 07 '17 at 22:41
2

Your problem can be solved through smart_str function of Django module.

Use this code:

from django.utils.encoding import smart_str, smart_unicode
cmd = u'cmd /c "C:\\Pok\xe9mon.mp3"'
smart_cmd = smart_str(cmd)
subprocess.Popen(smart_cmd)

You can find information on how to install Django on Windows here. You can first install pip and then you can install Django by starting a command shell with administrator privileges and run this command:

pip install Django

This will install Django in your Python installation's site-packages directory.

Community
  • 1
  • 1
Thanasis Petsas
  • 4,378
  • 5
  • 31
  • 57
  • I won't install a whole new framework just to encode unicode correctly. fix should be one or two lines long, not 1000+ of complex code. – iTayb Mar 30 '12 at 12:18
  • ok, am sorry, I have updated my answer. Maybe it is more helpful now. – Thanasis Petsas Mar 30 '12 at 14:01
  • First, the `latin-1` encoding is not unicode. It won't work with all unicode cases. Second, it's still doesn't work. Try it yourself. – iTayb Mar 30 '12 at 14:29
  • ok, I work on Linux and I tested it with the `os.popen` it worked.. Maybe for windows doesn't work.. :( I remove my updated part of the answer. – Thanasis Petsas Mar 30 '12 at 15:42
  • 1
    Someone made a standalone module out of Django smart_str: [smartencoding](https://pypi.org/project/smartencoding/) – gaborous Jan 01 '19 at 15:00
  • @gaborous that's really helpful! Good idea to isolate and include that functionality in a module. thanks! – Thanasis Petsas Jan 19 '19 at 10:09
-1
>>> subprocess.call(['start', u'avión.mp3'.encode('latin1')], shell=True)
0

There's no need to call cmd if you use the shell parameter The correct way to launch an associated program is to use the cmd's start built-in AFAIK.

My 2c, HIH.

KurzedMetal
  • 12,540
  • 6
  • 39
  • 65
  • Thanks for the side note, but this still doesn't fix the unicode problem. This works on your system because your locale MBCS has the ó char. This code won't work on computers that has hebrew or japanese as their locale language. – iTayb Mar 30 '12 at 19:40
-2

I think windows uses 16-bit characters, not sure if it's UCS2 or UTF16 or something like that. So I guess that it could have an issue with UTF8.

katzenversteher
  • 810
  • 6
  • 13
  • setting as 'utf-16' returns `TypeError: must be string without null bytes or None, not str` so i guess thats wrong. – iTayb Mar 30 '12 at 10:36