All my scripts use Unicode literals throughout, with
from __future__ import unicode_literals
but this creates a problem when there is the potential for functions being called with bytestrings, and I'm wondering what the best approach is for handling this and producing clear helpful errors.
I gather that one common approach, which I've adopted, is to simply make this clear when it occurs, with something like
def my_func(somearg):
"""The 'somearg' argument must be Unicode."""
if not isinstance(arg, unicode):
raise TypeError("Parameter 'somearg' should be a Unicode")
# ...
for all arguments that need to be Unicode (and might be bytestrings). However even if I do this, I encounter problems with my argparse
command line script if supplied parameters correspond to such arguments, and I wonder what the best approach here is. It seems that I can simply check the encoding of such arguments, and decode them using that encoding, with, for example
if __name__ == '__main__':
parser = argparse.ArgumentParser(...)
parser.add_argument('somearg', ...)
# ...
args = parser.parse_args()
some_arg = args.somearg
if not isinstance(config_arg, unicode):
some_arg = some_arg.decode(sys.getfilesystemencoding())
#...
my_func(some_arg, ...)
Is this combination of approaches a common design pattern for Unicode modules that may receive bytestring inputs? Specifically,
- can I reliable decode command line arguments in this way, and
- will
sys.getfilesystemencoding()
give me the correct encoding for command line arguments; or - does
argparse
provide some builtin facility for accomplishing this that I've missed?