I am trying to count the number of same words in an Urdu document which is saved in UTF-8.
so for example I have document containing 3 exactly same words separated by space
خُداوند خُداوند خُداوند
I tried to count the words by reading the file using the following code:
file_obj = codecs.open(path,encoding="utf-8")
lst = repr(file_obj.readline()).split(" ")
word = lst[0]
count =0
for w in lst:
if word == w:
count += 1
print count
but the value of count I am getting is 1 while I should get 3.
How does one compare Unicode strings?