I have a have .txt
file that I am using that has multiple lines that contain sentences. Let's say that file is called sentences.txt
. I also have a dictionary that I am using that contains pre-defined sentiment for about 2500 words, let's call that dictionary sentiment_scores
. My goal is to return a dictionary that predicts the sentiment value for a word that is not in sentiment_scores. I am doing this by taking the average score for each sentence that the word is in.
with open('sentences.txt', 'r') as f:
sentences = [line.strip() for line in f]
f.close()
for line in sentences:
for word in line.split(): #This will iterate through words in the sentence
if not (word in sentiment_scores):
new_term_sent[word] = 0 #Assign word a sentiment value of 0 initially
for key in new_term_sent:
score = 0
num_sentences = 0
for sentence in sentences:
if key in sentence.split():
num_sentences+=1
val = get_sentiment(sentence) #This function returns the sentiment of a sentence
score+=val
if num_sentences != 0:
average = round((score)/(num_sentences),1)
new_term_sent[key] = average
return new_term_sent
Please note: this method works, but the time complexity is too long, takes about 80 seconds to run on my laptop.
My question is therefore how I can do this more efficiently? I have tried just using .readlines()
on sentence.txt
, but that did not work (can't figure out why, but I know it has to do with iterating through the text file multiple times; maybe a pointer is disappearing somehow). Thank you in advance!