0

I have written code to do Sentiment Analysis, therefore I use two different dictionaries in which sentences are tagges as negative oder positive. My code snippet so far Looks like this:

def format_sentence(sentence):
     return {word: True for word in word_tokenize(satz) }

pos_data = []
with open('Positiv.txt') as f:
    for line in f:
        pos_data.append([format_sentence(line), 'pos'])

neg_data = []
with open('Negativ.txt') as f:
    for line in f:
       neg_data.append([format_sentence(line), 'neg'])

training_data = pos_data[:3] +  neg_data[:3]
test_data = pos_data[3:] + neg_data[3:]

model = NaiveBayesClassifier.train(training_data)

Now I would like the code to elimate all Stopwords from the sentences in the dictionary but I don't know how to implement that into my code as I am a beginner in Python programming. I would be very thankful if anyone could help me with this :)

Tommy5
  • 1
  • 2
  • What is a "stopword", and how do you define "elimination"? – th3an0maly Apr 13 '16 at 13:25
  • stopwords are words like 'and', 'but' and so on. I want the Classifier to not include These kinds of words in the Training data – Tommy5 Apr 13 '16 at 13:50
  • Possible duplicate of [Stopword removal with NLTK](http://stackoverflow.com/questions/19130512/stopword-removal-with-nltk) – alvas Apr 13 '16 at 18:10

2 Answers2

1

It looks like you are using the Naive Bayes Classifier implementation in NLTK. NLTK also has built in stopword lists for some languages.

from nltk.corpus import stopwords
stops = stopwords.words('english')

def format_sentence(sentence):
    return {word: True for word in word_tokenize(sentence) if word not in stops}
aberger
  • 2,299
  • 4
  • 17
  • 29
-1

If you use only python lists, try this template of code, which creates a new list with deleted stopwords:

list_without_stopwords = [word for word in original_list if word not in stopword_list]
mrEvgenX
  • 758
  • 8
  • 16