I already use movie_reviews corpus to make sentiment analysis. I replaced the existing text files with Arabic language text files, but I couldn't read and print them; I have a problem at encoding.
My code:
import nltk
from nltk.corpus import movie_reviews
documents = []
for category in movie_reviews.categories():
for fileid in movie_reviews.fileids(category):
documents.append([movie_reviews.words(fileid),category])
print(documents[0])
I have this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)