-1

I'm following this tutorial from the website: https://towardsdatascience.com/creating-the-twitter-sentiment-analysis-program-in-python-with-naive-bayes-classification-672e5589a7ed Everything is good so far but I keep getting an error when trying to run this code.

def buildTrainingSet(corpusFile, tweetDataFile):
import csv
import time

corpus = []

with open(corpusFile,'rb') as csvfile:
    lineReader = csv.reader(csvfile,delimiter=',', quotechar="\"")
    for row in lineReader:
        corpus.append({"tweet_id":row[2], "label":row[1], "topic":row[0]})

rate_limit = 180
sleep_time = 900/180

trainingDataSet = []

for tweet in corpus:
    try:
        status = twitter_api.GetStatus(tweet["tweet_id"])
        print("Tweet fetched" + status.text)
        tweet["text"] = status.text
        trainingDataSet.append(tweet)
        time.sleep(sleep_time) 
    except: 
        continue
# now we write them to the empty CSV file
with open(tweetDataFile,'wb') as csvfile:
    linewriter = csv.writer(csvfile,delimiter=',',quotechar="\"")
    for tweet in trainingDataSet:
        try:
            linewriter.writerow([tweet["tweet_id"], tweet["text"], tweet["label"], tweet["topic"]])
        except Exception as e:
            print(e)
return trainingDataSet
 #================
corpusFile = "C:\Users\Vilma\Documents\CIS450\group prjt/corpus.csv"
tweetDataFile = "C:\Users\Vilma\Documents\CIS450\group prjt/tweetDataFile.csv"

trainingData = buildTrainingSet (corpusFile, tweetDataFile)

I keep getting this error:

 File "<ipython-input-33-54fea359e8f9>", line 1
    corpusFile = "C:\Users\Vilma\Documents\CIS450\group prjt/corpus.csv"
                ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

I even tried putting r' in front of C:\Users\Vilma\Documents\CIS450\group prjt/corpus.csvbut I still keeping getting error.

update: Fixed error, I put code as

corpusFile = r'C:\Users\Vilma\Documents\CIS450\group prjt\corpus.csv'
tweetDataFile = r'C:\Users\Vilma\Documents\CIS450\group prjt\tweetDataFile.csv'

However, a new error pops up:

File "<ipython-input-41-f44768dabc6e>", line 7, in buildTrainingSet
    with open(corpusFile,'rb') as csvfile:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Vilma\\Documents\\CIS450\\group prjt\\corpus.csv'
Kitana96
  • 1
  • 2
  • Maybe this post will help: https://stackoverflow.com/questions/1347791/unicode-error-unicodeescape-codec-cant-decode-bytes-cannot-open-text-file. I understand you've already used the "r" option but this post mentions more than that and it could be worth a peek! – MT_dev Mar 01 '20 at 23:34
  • Duplicate of [["Unicode Error "unicodeescape" codec can't decode bytes... Cannot open text files in Python 3](https://stackoverflow.com/questions/1347791/unicode-error-unicodeescape-codec-cant-decode-bytes-cannot-open-text-file) – Jongware Mar 02 '20 at 00:23

2 Answers2

0

Try correcting your file path.

corpusFile = "C:\Users\Vilma\Documents\CIS450\group prjt/corpus.csv"

    Should be:

    corpusFile = "C:\Users\Vilma\Documents\CIS450\group prjt\corpus.csv"

Hope this helps!

abigya
  • 42
  • 6
  • Hey! Welcome to StackOverFlow :) Would you mind explaining why you switched the type of slash? In general, answers are more helpful if you can explain why you did what you did, and why you think it works! – MT_dev Mar 01 '20 at 23:33
  • I noticed that there was an error in the file path because the the slash before corpus.csv was different than the rest . Normally, file paths do not use two different slashes. – abigya Mar 01 '20 at 23:43
0

You can use:

corpusFile = r"C:\Users\Vilma\Documents\CIS450\group prjt\corpus.csv"

If you are not finding the file, please make sure the file exists in the folder.

haru
  • 11
  • 2