0

I'm trying to count how many times a negative word from a list appears before a specific word. For example, "This terrible laptop." The specified word being "laptop", I want the output to have "Terrible 1" in Python.

def run(path):
    negWords={} #dictionary to return the count
    #load the negative lexicon
    negLex=loadLexicon('negative-words.txt')
    fin=open(path)

    for line in fin: #for every line in the file (1 review per line)
        line=line.lower().strip().split(' ')
        review_set=set() #Adding all the words in the review to a set

        for word in line: #Check if the word is present in the line
            review_set.add(word)  #As it is a set, only adds one time

        for word in review_set:
            if word in negLex:
                if word in negWords:
                    negWords[word]=negWords[word]+1
                else:
                    negWords[word] = 1

    fin.close()
    return negWords

if __name__ == "__main__": 
    print(run('textfile'))
  • 1
    You need to explain what was wrong with your code (i.e. what are you expecting and what you got instead). Secondly, there is no where in the code where you specify the word of interest (laptop in your example). – Mohammed Elmahgiubi Feb 19 '19 at 20:47

2 Answers2

0

It looks like you want to check a function against consecutive words, here is one way to do it, condition will be checked against every consecutive words.

text = 'Do you like bananas? Not only do I like bananas, I love bananas!'
trigger_words = {'bananas'}
positive_words = {'like', 'love'}

def condition(w):
    return w[0] in positive_words and w[1] in trigger_words

for c in '.,?!':
    text = text.replace(c, '')

words = text.lower().split()

matches = filter(condition, zip(words, words[1:]))
n_positives = 0
for w1, w2 in matches:
    print(f'{w1.upper()} {w2} => That\'s positive !')
    n_positives += 1
print(f'This text had a score of {n_positives}')

Output:

LIKE bananas => That's positive !
LIKE bananas => That's positive !
LOVE bananas => That's positive !
3

Bonus:

  1. You can search for 3 consecutive words by just changing zip(w, w[1:]) to zip(w, w[1:], w[2:]) with a condition that checks for 3 words.

  2. You can get a counter dictionary by doing this:

from collections import Counter
counter = Counter((i[0] for i in matches)) # counter = {'like': 2, 'love': 1}
Benoît P
  • 3,179
  • 13
  • 31
0

This should do what you're looking for, it uses set & intersection to avoid some of the looping. The steps are —

  1. get the negative words in the line
  2. check the location of each word
  3. if the word after that location is 'laptop' record it

Note that this will only identify the first occurrence of a negative word in a line, so "terrible terrible laptop" will not be a match.

from collections import defaultdict

def run(path):

    negWords=defaultdict(int)  # A defaultdict(int) will start at 0, can just add.

    #load the negative lexicon
    negLex=loadLexicon('negative-words.txt')
    # ?? Is the above a list or a set, if it's a list convert to set
    negLex = set(negLex)

    fin=open(path)

    for line in fin: #for every line in the file (1 review per line)
        line=line.lower().strip().split(' ')

        # Can just pass a list to set to make a set of it's items.
        review_set = set(line)

        # Compare the review set against the neglex set. We want words that are in
        # *both* sets, so we can use intersection.
        neg_words_used = review_set & negLex

        # Is the bad word followed by the word laptop?            
        for word in neg_words_used:
            # Find the word in the line list
            ix = line.index(word)
            if ix > len(line) - 2:
                # Can't have laptop after it, it's the last word.
                continue

            # The word after this index in the line is laptop.
            if line[ix+1] == 'laptop':
                negWords[word] += 1

    fin.close()
    return negWords

If you're only interested in words preceding the word 'laptop', a far more sensible approach would be to look for the word 'laptop', then check the word prior to that to see if it is a negative word. The following example does that.

  1. find laptop in the current line
  2. if laptop isn't in the line, or is the first word, skip the line
  3. get the word before laptop, check against the negative words
  4. if you have a match add it to our result

This avoids doing lookups for words which are not related to laptops.

from collections import defaultdict

def run(path):

    negWords=defaultdict(int)  # A defaultdict(int) will start at 0, can just add.

    #load the negative lexicon
    negLex=loadLexicon('negative-words.txt')
    # ?? Is the above a list or a set, if it's a list convert to set
    negLex = set(negLex)

    fin=open(path)

    for line in fin: #for every line in the file (1 review per line)
        line=line.lower().strip().split(' ')

        try:
            ix = line.index('laptop')
        except ValueError:
            # If we dont' find laptop, continue to next line.
            continue

        if ix == 0:
            # Laptop is the first word of the line, can't check prior word.
            continue


        previous_word = line[ix-1]

        if previous_word in negLex:
            # Negative word before the current one.
            negWords[previous_word] += 1

    fin.close()
    return negWords
mfitzp
  • 15,275
  • 7
  • 50
  • 70