-2

I'm following a sentiment analysis tutorial in which he provided the files to use. These are simple .txt files and when he runs the same code it works fine. But when I run it I get the below error.

with open("positive.txt","r") as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        if analysis.sentiment.polarity > 0:
            pos_correct += 1
        pos_count +=1

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 4645: invalid continuation byte

How can we have the same code and same file but I get this error and he does not?

Here is a link to the file if this helps. https://pythonprogramming.net/static/downloads/short_reviews/

Update to my question. I was trying this on my Mac when I ran into the issue. Tried it on my Windows PC and it works fine. Any ideas what the difference would be? Did everything the same.

Conweezy
  • 105
  • 5
  • 15
  • 1
    You may need to convert "positive.txt" to the `utf-8` encoding, or specify the encoding of the 'positive.txt" file. – monkut May 25 '22 at 01:58
  • 1
    I don't get an error when I read the file and just count the lines instead of doing the processing you're doing. Please [edit] your question and post the [*full text* of the error/traceback](https://meta.stackoverflow.com/q/359146) you are getting. – MattDMo May 25 '22 at 02:08
  • see if [this](https://stackoverflow.com/questions/30996289/utf8-codec-cant-decode-byte-0xf3) helps. – ytung-dev May 25 '22 at 02:08
  • See also https://nedbatchelder.com/text/unipain.html – Karl Knechtel May 25 '22 at 03:07

1 Answers1

2

It seems like the file is not encoded in utf-8. Could you try open the file using io.open with latin-1 encoding instead?

from textblob import TextBlob
import io

# dummy variables initialization
pos_correct = 0
pos_count = 0

with io.open("positive.txt", encoding='latin-1') as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        if analysis.sentiment.polarity > 0:
            pos_correct += 1
        pos_count +=1
tax evader
  • 2,082
  • 1
  • 7
  • 9