3

I will just ask if anyone knows using python 2.7, how will i be able to pass a unicode string (e.g. Japanese filename) as a commandline argument of my python script. Once this filename is passed to the function/method correctly, some file processing will be done (e.g. metadata extraction / retrieval) by some engine (a DLL, which is identified to have unicode support). I've tried the following but unfortunately, python crashes:

Passing of the filename to the method that will process the file:

processingMethod(unicode(argv[1], "utf-8", errors="ignore").encode("utf-8"))

on the method, this is how I decode the string passed:

unicode(file_path).decode("utf-8")

Any feedback will be of great help. Thanks a lot!

Adam Mihalcin
  • 14,242
  • 4
  • 36
  • 52
jaysonpryde
  • 2,733
  • 11
  • 44
  • 61

1 Answers1

0

unicode(argv[1], "utf-8"

Unfortunately, the encoding used by the Windows command prompt is never(*) UTF-8. It's a locale-specific encoding, so you can only pass Japanese characters in an argument on a Japanese Windows install.

If you want to be able to read Unicode characters in arguments reliably from Python 2, you will have to sniff to detect you're running on Windows, and use the Windows-specific APIs to read args instead of the standard C library ones that rely on the locale encoding. See this answer for an example of doing it with ctypes.

(*: well, unless you do chcp 65001, but that causes lots of other stuff to fall over so is best avoided.)

Community
  • 1
  • 1
bobince
  • 528,062
  • 107
  • 651
  • 834