How to decode unicode string to unicode value

Question

I have a program in Python 2.7 that does the following:

Ask the user for input (In Non English characters. E.g. Hebrew, English)
Split each character of the sentence in a list. (The input can be a small paragraph, or an email)
Convert the characters to Unicode values. So in the end every item of the list is a unicode escape char e.g. "u/0391" that can be manipulate it as string.

Ι started quite well but I can't split the letters in the array nor print the right unicode value.

Gr_text = unicode(raw_input("Type your message below:\n"), 'unicode-escape')

Gr = Gr_text.split()

print Gr

Example input:

Ενα απλο παραδειγμα.

The input (translate as "A simple example") is in Greek language without intonations. This sentence should be transform in a list as

['\u0395', '\u03bd', '\u03b1','\u0020', '\u03b1', '\u03c0', '\u03bb', '\u03bf','\u0020', '\u03c0', '\u03b1', '\u03c1', '\u03b1', '\u03b4', '\u03b5', '\u03b9', '\u03b3', '\u03bc', '\u03b1','\u0020',]

To point out I also want to convert spaces and special characters. Then I get every letter of the list as string of unicode and not as simple letter so I can manipulate and give it other value.

Please give an example of the input and the corresponding expected result. — das-g, Oct 17 '15 at 16:12
You need to consider the order you're doing things, and also realize that Python 2.7 doesn't input Unicode characters - you'll need to use `decode`. — Mark Ransom, Oct 17 '15 at 16:20

score 0 · Answer 1 · edited May 23 '17 at 10:27

0

I have tested this and it works for me but your mileage may vary.

import sys, locale

Gr_text = raw_input('Type your message below:\n').decode(sys.stdin.encoding or locale.getpreferredencoding(True))

Gr = Gr_text.split()

print Gr

“Full Disclosure” credit goes to https://stackoverflow.com/a/477496/1427800

edited May 23 '17 at 10:27

Community

1
1

answered Oct 17 '15 at 16:31

jesterjunk

2,342
22
18

thank you, but you miss something. I want to manipulate every letter of a word in an item in the list. When i type `for i in Gr: for x in i: h = unicode(x) manipulate_every_unicode letter()` can't get in str h the unicode value. – GeorgeG Oct 17 '15 at 16:48

How to decode unicode string to unicode value

1 Answers1