How to elimate stopwords in this code?

Question

I have written code to do Sentiment Analysis, therefore I use two different dictionaries in which sentences are tagges as negative oder positive. My code snippet so far Looks like this:

def format_sentence(sentence):
     return {word: True for word in word_tokenize(satz) }

pos_data = []
with open('Positiv.txt') as f:
    for line in f:
        pos_data.append([format_sentence(line), 'pos'])

neg_data = []
with open('Negativ.txt') as f:
    for line in f:
       neg_data.append([format_sentence(line), 'neg'])

training_data = pos_data[:3] +  neg_data[:3]
test_data = pos_data[3:] + neg_data[3:]

model = NaiveBayesClassifier.train(training_data)

Now I would like the code to elimate all Stopwords from the sentences in the dictionary but I don't know how to implement that into my code as I am a beginner in Python programming. I would be very thankful if anyone could help me with this :)

stopwords are words like 'and', 'but' and so on. I want the Classifier to not include These kinds of words in the Training data — Tommy5, Apr 13 '16 at 13:50
Possible duplicate of [Stopword removal with NLTK](http://stackoverflow.com/questions/19130512/stopword-removal-with-nltk) — alvas, Apr 13 '16 at 18:10

score 1 · Answer 1 · answered Apr 13 '16 at 15:11

It looks like you are using the Naive Bayes Classifier implementation in NLTK. NLTK also has built in stopword lists for some languages.

from nltk.corpus import stopwords
stops = stopwords.words('english')

def format_sentence(sentence):
    return {word: True for word in word_tokenize(sentence) if word not in stops}

score -1 · Answer 2 · answered Apr 13 '16 at 14:25

-1

If you use only python lists, try this template of code, which creates a new list with deleted stopwords:

list_without_stopwords = [word for word in original_list if word not in stopword_list]

answered Apr 13 '16 at 14:25

mrEvgenX

758
8
16

How to elimate stopwords in this code?

2 Answers2