I'm writing a script to read from a corpus file and find suffixes. Since there are Persian words in the corpus it is UTF-8 encoded, however when I use Persian suffixes for searching I get no results, English results on the other hand back fine.
from __future__ import unicode_literals
import nltk
import sys
for line in open("corpus.txt"):
for word in line.split():
if word.endswith('ب'):
print (word)