I'm currently writing a program which utilizes the Python NLTK library to determine whether a review is positive or negative. When trying to tokenize and store each word in an array, I keep getting the above error. The lines of code before and up to the error lines are:
from nltk.tokenize import word_tokenize
...
short_pos = open("reviews/pos_reviews.txt", "r").read()
short_neg = open("reviews/neg_reviews.txt", "r").read()
documents = []
for r in short_pos.split('\n'):
documents.append( (r, "pos") )
for r in short_neg.split('\n'):
documents.append( (r, "neg") )
all_words = []
short_pos_words = word_tokenize(short_pos)
short_neg_words = word_tokenize(short_neg)
The second to last line is where it's saying I have an error. If I comment out that line, the error appears on the following line. I'm not sure where this error would arise, as I didn't think I was working with unicode at all. Any help would be appreciated!