13

Running Python 2.7

When executing:

$ python client.py get_emails -a "åäö"

I get:

usage: client.py get_emails [-h] [-a AREA] [-t {rfc2822,plain}]
client.py get_emails: error: argument -a/--area: invalid unicode value: '\xc3\xa5\xc3\xa4\xc3\xb6'

This is my parser:

def _argparse():
    desc = """
           Simple CLI-client for...
           """
    argparser = argparse.ArgumentParser(description=desc)
    subparsers = argparser.add_subparsers(dest='command')

    # create the parser for the "get_emails" command
    parser_get_emails = subparsers.add_parser('get_emails', help=u'Get email list')
    parser_get_emails.add_argument('-a', '--area', type=unicode, help='Limit to area')
    parser_get_emails.add_argument('-t', '--out_type', choices=['rfc2822', 'plain'],
                                   default='rfc2822', help='Type of output')

    args = argparser.parse_args()
    return args

Does this mean I can't use any unicode characters with python argparse module?

Niclas Nilsson
  • 5,691
  • 3
  • 30
  • 43

2 Answers2

17

You can try

type=lambda s: unicode(s, 'utf8')

instead of

type=unicode

Without encoding argument unicode() defaults to ascii.

gog
  • 10,367
  • 2
  • 24
  • 38
  • Great, I will test it tomorrow when in front of the computer again. Was thinking along the lines to use `lambda s: unicode(s, locale.getdefaultlocale()[1])` I suppose that would be more flexible. Any caveats? – Niclas Nilsson Apr 08 '14 at 20:41
  • 2
    @NiclasNilsson: getdefaultlocale can return None,None under circumstances, so you're going to need a fallback, like `getdefaultlocale()[1] or 'utf8'` – gog Apr 08 '14 at 20:50
  • 1
    the encoding may be different. Use `sys.getfilesystemencoding()` instead of hardcoding `utf8` here. – jfs Apr 15 '14 at 13:34
  • Is that more safe then locale.getdefaultlocale()[1] ? – Niclas Nilsson Apr 15 '14 at 19:53
  • 1
    @NiclasNilsson: I have little experience with non-unicode consoles, but [here](http://stackoverflow.com/questions/4012571/python-which-encoding-is-used-for-processing-sys-argv) people say that `getfilesystemencoding` is not the argv encoding. It _might_ be `sys.stdin.encoding` though. – gog Apr 15 '14 at 20:58
  • 2
    @georg: I don't see `sys.getfilesystemencoding()` mentioned in the link you provided. Why do you think `sys.argv` items are not in `sys.getfilesystemencoding()`? There are issues with undecodable arguments but it is a different problem. – jfs Jul 05 '14 at 00:29
15

The command-line arguments are encoded using sys.getfilesystemencoding():

import sys

def commandline_arg(bytestring):
    unicode_string = bytestring.decode(sys.getfilesystemencoding())
    return unicode_string

# ...
parser_get_emails.add_argument('-a', '--area', type=commandline_arg)

Note: You don't need it in Python 3 (the arguments are already Unicode). It uses os.fsdecode() in this case because sometimes command-line arguments might be undecodable. See PEP 383 -- Non-decodable Bytes in System Character Interfaces.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • 1
    the actual answer may be more complicated if you need to support filenames undecodable in the current local encoding (Unicode API on Windows, misconfigured locale on Linux). See more details in [Как работать с путями c русскими символами?](http://ru.stackoverflow.com/a/527872/23044) (ask if you need the translation) – jfs Jun 03 '16 at 16:46
  • This works for me. In my case the argument value is a string of Chinese characters. The accepted answer complains `invalid value:`. – ElpieKay Mar 16 '21 at 13:20