0

I have to work with currency symbols on input parameters of my python file. Well, but I don't know, how to convert it to useable way.

Example:

My input:

--amount 100.0 --input_currency € --output_currency CZK

What I get for the € symbol:

\x80

What I need to get:

u'\u20ac'

I tried to use decode('utf-8') but it doesn't work, it returns:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

Can you help me please?

Warle
  • 123
  • 2
  • 16
  • What operating system you're using affects how parameters to you program are encoded. What OS are you using? – Tom Dalton Jan 15 '16 at 23:50

2 Answers2

1

On POSIX systems, it depends entirely on how your console or terminal is configured what encoding is used for those strings.

In those environments, use locale.getpreferredencoding() to query what encoding was configured, then use that to decode the string. This is not foolproof, but should work whenever the console or terminal was configured correctly.

In your specific case you probably are using a Windows system configured to use Windows Codepage 1252:

>>> '\x80'.decode('cp1252')
u'\u20ac'
>>> print '\x80'.decode('cp1252')
€

Windows does provide the GetCommandLineW() and CommandLineToArgvW() functions to retrieve the Unicode value for the command line, and then parse that value into an argv-like array; using this from Python can be done with the ctypes library; paraphrasing this example this is how you could use it:

from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int
from ctypes.wintypes import LPWSTR, LPCWSTR
GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32))
CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int))(("CommandLineToArgvW", windll.shell32))

argc = c_int(0)
argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc))
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • ...and because you're using code page 1252, you won't be able to use `₪`, as this character doesn't exist in that code page. The Windows Console is not a good place to be doing anything to do with Unicode. – bobince Jan 16 '16 at 14:39
0

On Python 3, sys.argv is already a list of Unicode strings. You don't need to do anything.

On Python 2, on Windows, you should use Unicode API (CommandLineToArgvW(), GetCommandLineW()). It allows to pass characters that can't be represented using the current OEM code page such as cp437 (chcp result).

On Python 2, on POSIX, sys.argv[i] may be an arbitrary byte sequence. Normally, It may be encoded using sys.getfilesystemencoding() that is derived from locale on Linux.

See Best way to decode command line inputs to Unicode Python 2.7 scripts.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Then Linux appears to act differently from my Mac OS X terminal, where it is the *locale* that determines how command-line arguments are encoded. It is the locale that determines how my terminal input is encoded and thus how Bash receives it, at any rate. – Martijn Pieters Jan 17 '16 at 10:35