11

I use Google App Engine and cannot use any C/C++ extension, just pure & pythonic library to do conversion of Unicode/UTF-8 strings to lower/upper case. str.lower() and string.lowercase() don't.

Viet
  • 17,944
  • 33
  • 103
  • 135

1 Answers1

26

str encoded in UTF-8 and unicode are two different types. Don't use string, use the appropriate method on the unicode object:

>>> print u'ĉ'.upper()
Ĉ

Decode str to unicode before using:

>>> print 'ĉ'.decode('utf-8').upper()
Ĉ
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • Thanks. Is this applicable to Vietnamese? – Viet Jan 27 '10 at 09:59
  • 1
    It should be. It's not hard to test in the interactive interpreter. – Ignacio Vazquez-Abrams Jan 27 '10 at 10:03
  • 1
    My code does not work for Russian and Vietnamese. I don't know other languages http://oladic.appspot.com/add/ОИЧУНКАЛС http://oladic.appspot.com/add/TÌNH%20YÊU http://oladic.appspot.com/add/ĉĉĉĉ – Viet Jan 27 '10 at 10:10
  • 1
    Finally it worked! Thank you very much! I wish I could vote more! – Viet Jan 27 '10 at 10:11
  • 3
    Viet: you probably want to URL-encode unicode characters if you're putting them in a URL (although it's probably easier to just POST them as utf-8, assuming you're using a form to submit them). – Wooble Jan 27 '10 at 15:26
  • Python 3: *'str' object has no attribute 'decode'* – mins Aug 13 '23 at 15:50