How to iterate through arabic word in python?

Question

I have the following python simple script that will show me letter of each sentence in line:

import sys
import unicodedata
import codecs

with codecs.open(sys.argv[1],'r', encoding='utf-8') as file:
        lines = file.readlines()


counter = 0

for line in lines:
        print "In line " + str(counter)
        for unicode_letter in line:
                print unicodedata.name(unicode_letter).split()[-1]
        counter += 1


print "\nI'm Done Sir."

But i'm getting the following error:

In line 0
ALEF
LAM
SEEN
LAM
ALEF
MEEM
SPACE
AIN
LAM
YEH
KAF
MEEM
Traceback (most recent call last):
  File "convert_to_phonems.py", line 16, in <module>
    print unicodedata.name(unicode_letter).split()[-1]
ValueError: no such name

I'm really a beginner in python i would say i really liked how it can map unicode chars and tells you what letter or char is it in string data type.

Edit: This is segmentation of what text look like in the input file:

السلام عليكم
السلام عليكم و رحمة الله
السلام عليكم و رحمة الله و بركاته
الحمد لله
كيف حالك
كيف الحال

It seems like its choking on the newline or carriage return. Try reading your file with `rb`. — Burhan Khalid, Dec 29 '15 at 05:20
Solved. Thanks for your idea, i have added an if statement to check for newline and space: " if unicode_letter == '\n' or unicode_letter == ' ': print "NEWLINEORSPACE" else: print unicodedata.name(unicode_letter).split()[2] " — 0x01Brain, Dec 29 '15 at 05:38
And i also had some arabic vowels notations in the text (Harakat), i have removed some of them from some words and worked. — 0x01Brain, Dec 29 '15 at 05:40

How to iterate through arabic word in python?

0 Answers0