2

I have the following python simple script that will show me letter of each sentence in line:

import sys
import unicodedata
import codecs

with codecs.open(sys.argv[1],'r', encoding='utf-8') as file:
        lines = file.readlines()


counter = 0

for line in lines:
        print "In line " + str(counter)
        for unicode_letter in line:
                print unicodedata.name(unicode_letter).split()[-1]
        counter += 1


print "\nI'm Done Sir."

But i'm getting the following error:

In line 0
ALEF
LAM
SEEN
LAM
ALEF
MEEM
SPACE
AIN
LAM
YEH
KAF
MEEM
Traceback (most recent call last):
  File "convert_to_phonems.py", line 16, in <module>
    print unicodedata.name(unicode_letter).split()[-1]
ValueError: no such name

I'm really a beginner in python i would say i really liked how it can map unicode chars and tells you what letter or char is it in string data type.

Edit: This is segmentation of what text look like in the input file:

السلام عليكم
السلام عليكم و رحمة الله
السلام عليكم و رحمة الله و بركاته
الحمد لله
كيف حالك
كيف الحال
0x01Brain
  • 798
  • 2
  • 12
  • 28
  • 4
    Can you include the input file in your question? – Others Dec 29 '15 at 05:11
  • 1
    It seems like its choking on the newline or carriage return. Try reading your file with `rb`. – Burhan Khalid Dec 29 '15 at 05:20
  • No reading it with 'rb' didn't produce positive. – 0x01Brain Dec 29 '15 at 05:23
  • Solved. Thanks for your idea, i have added an if statement to check for newline and space: " if unicode_letter == '\n' or unicode_letter == ' ': print "NEWLINEORSPACE" else: print unicodedata.name(unicode_letter).split()[2] " – 0x01Brain Dec 29 '15 at 05:38
  • And i also had some arabic vowels notations in the text (Harakat), i have removed some of them from some words and worked. – 0x01Brain Dec 29 '15 at 05:40

0 Answers0