I have the following python simple script that will show me letter of each sentence in line:
import sys
import unicodedata
import codecs
with codecs.open(sys.argv[1],'r', encoding='utf-8') as file:
lines = file.readlines()
counter = 0
for line in lines:
print "In line " + str(counter)
for unicode_letter in line:
print unicodedata.name(unicode_letter).split()[-1]
counter += 1
print "\nI'm Done Sir."
But i'm getting the following error:
In line 0
ALEF
LAM
SEEN
LAM
ALEF
MEEM
SPACE
AIN
LAM
YEH
KAF
MEEM
Traceback (most recent call last):
File "convert_to_phonems.py", line 16, in <module>
print unicodedata.name(unicode_letter).split()[-1]
ValueError: no such name
I'm really a beginner in python i would say i really liked how it can map unicode chars and tells you what letter or char is it in string data type.
Edit: This is segmentation of what text look like in the input file:
السلام عليكم
السلام عليكم و رحمة الله
السلام عليكم و رحمة الله و بركاته
الحمد لله
كيف حالك
كيف الحال