0

I have a program in Python 2.7 that does the following:

  1. Ask the user for input (In Non English characters. E.g. Hebrew, English)
  2. Split each character of the sentence in a list. (The input can be a small paragraph, or an email)
  3. Convert the characters to Unicode values. So in the end every item of the list is a unicode escape char e.g. "u/0391" that can be manipulate it as string.

Ι started quite well but I can't split the letters in the array nor print the right unicode value.

Gr_text = unicode(raw_input("Type your message below:\n"), 'unicode-escape')

Gr = Gr_text.split()

print Gr

Example input:

Ενα απλο παραδειγμα.

The input (translate as "A simple example") is in Greek language without intonations. This sentence should be transform in a list as

['\u0395', '\u03bd', '\u03b1','\u0020', '\u03b1', '\u03c0', '\u03bb', '\u03bf','\u0020', '\u03c0', '\u03b1', '\u03c1', '\u03b1', '\u03b4', '\u03b5', '\u03b9', '\u03b3', '\u03bc', '\u03b1','\u0020',]

To point out I also want to convert spaces and special characters. Then I get every letter of the list as string of unicode and not as simple letter so I can manipulate and give it other value.

GeorgeG
  • 31
  • 4
  • 1
    Please give an example of the input and the corresponding expected result. – das-g Oct 17 '15 at 16:12
  • You need to consider the order you're doing things, and also realize that Python 2.7 doesn't input Unicode characters - you'll need to use `decode`. – Mark Ransom Oct 17 '15 at 16:20

1 Answers1

0

I have tested this and it works for me but your mileage may vary.

import sys, locale

Gr_text = raw_input('Type your message below:\n').decode(sys.stdin.encoding or locale.getpreferredencoding(True))

Gr = Gr_text.split()

print Gr


“Full Disclosure” credit goes to https://stackoverflow.com/a/477496/1427800

Community
  • 1
  • 1
jesterjunk
  • 2,342
  • 22
  • 18
  • thank you, but you miss something. I want to manipulate every letter of a word in an item in the list. When i type `for i in Gr: for x in i: h = unicode(x) manipulate_every_unicode letter()` can't get in str h the unicode value. – GeorgeG Oct 17 '15 at 16:48