I have a csv file which has one of the 4 columns namely tweet_id, label, topic,text. In one of the rows, the "text" column has the value:
I'm wit chu!! “@ShayDiddy: Officially boycotting @ups!!! Calling @apple to curse them out next for using them wasting my time!â€
I am using this code for importing the data:
def createTrainingCorpus(corpusFile):
import csv
corpus=[]
with open(corpusFile,'rb') as csvfile:
lineReader = csv.reader(csvfile,delimiter=',')
r=1
for row in lineReader:
if r<257:
corpus.append({"tweet_id":row[2],"label":row[1],"topic":row[0],"text":row[4]})
r=r+1
return corpus
corpusFile= "/Users/name/Desktop/corpus.csv"
TrainingData= createTrainingCorpus(corpusFile)
This line doesn't get added to the list TrainingData and I receive an error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 6: ordinal not in range(128)
The TrainingData list has all the elements as expected until the loop reaches the row with the "text" as mentioned above. I googled for the error but couldn't find solution that worked for me. Please help.