1

Got some issues with ord() command and Unicode.

I want the decimal number of the entered ASCII letters.

For Example:

ord('ÄÖÜ') brings me these values: [195, 132, 195, 150, 195, 156]

  • [195,132] = Ä
  • [195,150] = Ö
  • [195,156] = Ü

This is what i want:

  • [196] = Ä
  • [214] = Ö
  • [220] = Ü

Any clues ?

Onilol
  • 1,315
  • 16
  • 41
Benny H.
  • 429
  • 1
  • 4
  • 16
  • [this](http://stackoverflow.com/questions/23271542/rpython-ord-with-non-ascii-character) and [this](http://stackoverflow.com/questions/1342000/how-to-make-the-python-interpreter-correctly-handle-non-ascii-characters-in-stri) may help – R Nar Nov 06 '15 at 18:09

2 Answers2

3

You want the Unicode code points, not the bytes in the UTF-8 encoding:

>>> mystring = u'ÄÖÜ'
>>> [ord(c) for c in mystring]
[196, 214, 220]
chepner
  • 497,756
  • 71
  • 530
  • 681
  • This might work if my string is placed in python itself, the string it placed in a variable, but this might work as well, thanks a lot :) – Benny H. Nov 06 '15 at 21:51
  • The only difference between my answer and Robert's is the use of a Unicode literal versus a call to `unicode`; both produce the same Unicode object, subject to the default character encoding. (In my answer, the Unicode literal uses whatever character encoding is in effect; in Robert's the string literal passed as the first argument must UTF-8, or the second argument is wrong.) – chepner Nov 06 '15 at 21:53
  • Ahh i see, can i use that unicode literal on variables too? – Benny H. Nov 06 '15 at 23:19
  • Of course. All information is stored with the value itself; the "variable" is simply a name bound to the value. – chepner Nov 07 '15 at 01:48
  • yeah i know, but how does this look like? something like that u'mystring' ? i guess this wont work because its in string quotation marks then. – Benny H. Nov 07 '15 at 09:11
  • How are you getting the string? – chepner Nov 07 '15 at 12:41
  • Sorry, forgot about that thread ;-) doesnt matter anyway cause it works like mentioned above :-) Thanks alot – Benny H. Dec 24 '15 at 11:33
2

This works for me:

>>> [ord(i) for i in unicode('ÄÖÜ','utf-8')]
[196, 214, 220]
Robert
  • 33,429
  • 8
  • 90
  • 94