UnicodeDecodeError: 'utf8' codec can't decode byte error

Question

I have a csv file which has one of the 4 columns namely tweet_id, label, topic,text. In one of the rows, the "text" column has the value:

    I'm wit chu!! â€œ@ShayDiddy: Officially boycotting @ups!!! Calling @apple to curse them out next for using them wasting my time!â€

I am using this code for importing the data:

    def createTrainingCorpus(corpusFile):
       import csv
       corpus=[]
       with open(corpusFile,'rb') as csvfile:
       lineReader = csv.reader(csvfile,delimiter=',')
       r=1
       for row in lineReader:
           if r<257:
             corpus.append({"tweet_id":row[2],"label":row[1],"topic":row[0],"text":row[4]})  
             r=r+1
       return corpus

    corpusFile= "/Users/name/Desktop/corpus.csv"
    TrainingData= createTrainingCorpus(corpusFile)

This line doesn't get added to the list TrainingData and I receive an error:

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 6: ordinal not in range(128)

The TrainingData list has all the elements as expected until the loop reaches the row with the "text" as mentioned above. I googled for the error but couldn't find solution that worked for me. Please help.

Have you tried specifying `encoding='utf-8'` as part of your `with open(..., encoding='utf-8') as ...:`? — TemporalWolf, Dec 13 '16 at 22:38
It looks like your example text has a missing character at the end. And I'm guessing you're on Windows? — Mark Ransom, Dec 13 '16 at 22:42
I am using a Mac and @TemporalWolf yes, i have tried using encoding='utf-8' too. Getting 'encoding' is an invalid keyword argument for this function. I am using python 2.7.9. — Shivam Saxena, Dec 13 '16 at 22:48
You could try [codecs.open](http://stackoverflow.com/a/844443/3579910) — TemporalWolf, Dec 13 '16 at 23:31
How did you create the file named by corpusFile? Did you perhaps save it from Excel which doesn't use utf-8? More to the point, what encoding does that file use? — Robᵩ, Dec 14 '16 at 00:54
Python 2.7's `csv` reader doesn't support Unicode without some help. See the [examples](https://docs.python.org/2/library/csv.html#examples) at the bottom of the [csv documentation](https://docs.python.org/2/library/csv.html). Better, switch to Python 3. — Mark Tolonen, Dec 14 '16 at 02:09

UnicodeDecodeError: 'utf8' codec can't decode byte error

0 Answers0