0

I just started to learn coding in Python and I have a simple Python program that returns Cześć <input> where <input> is the name that a user can input into CMD as a parameter for this Python program. If no input is given it'll return Cześć Świat. It works fine, but when I for instance input the name Łukasz it strips the strike from the Ł and the program returns Cześć Lukasz instead of the correct Cześć Łukasz.

In Windows CMD I used the CD command to go to the folder containing the Python program and there I execute the Python program by using the statement: hello.py Łukasz.

My script looks like this (it is originally from Google's Python exercises (source) and I edited it to make it work for unicode characters with Python version 2.7 and also replaced 'hello' with 'cześć' for instance):

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys

# Define a main() function that prints a little greeting.
    def main():
  # Get the name from the command line, using 'World' as a fallback.
  if len(sys.argv) >= 2:
    name = sys.argv[1].decode('cp1252')
  else:
    name = u'Świat'
  str = u'Cześć '+name
  print str.encode('utf-8')

# This is the standard boilerplate that calls the main() function.
if __name__ == '__main__':
  main()

Originally I decoded the sys.argv[1] with utf-8, but somehow when I used the letter Óó it would throw an ugly exception (see this SO answer). Using either utf-8 or cp1252 results in the Polish letters (e.g. ĄĆĘŁŃŚŻŹ) getting stripped of their accents, with the exception of the letter Óó which seems to keep their accent when using cp1252, because using that letter with utf-8 caused the previously mentioned exception.

So my question is, how do I retrieve the string intact with the accents from CMD to use in my Python program?

I won't accept answers that suggest to remove/ignore the accents!

Community
  • 1
  • 1
Teysz
  • 741
  • 9
  • 33

1 Answers1

4

This is a known limitation of Python 2 in Windows. sys.argv does not accept Unicode and characters are truncated to the standard ANSI character page. Upgrading to Python 3 will solve your issue.

Community
  • 1
  • 1
vz0
  • 32,345
  • 7
  • 44
  • 77
  • Wow... Python 3 doesn't even need `u'some string'` and the decode/encode anymore, it just accepts that I'm using diacritics! Thank you :) – Teysz Sep 29 '16 at 12:17