SO Im trying my hands at sentiment analysis. I heard at lots of places that NaiveBayes is good enough. So I gathered manually some negative comments (~400 ). Then after cleaning up the comments file I finally came up with these top most frequent words for negative comments :-
negative_comments.most_common(40) #Similarly for positive..
[('never', 79),
('i', 63),
('restaurant', 51),
('it', 48),
('one', 47),
('get', 47),
('time', 43),
('would', 41),
('bad', 39),
('service', 38),
('don', 36),
('us', 36),
('work', 35),
('family', 35),
('day', 35),
('please', 32),
('stove', 32),
('you', 31),
('like', 31),
('got', 28),
('back', 27),
('customer', 27),
('years', 25),
('good', 25),
('people', 24),
('open', 24),
('online', 24),
('days', 23),
('right', 23),
('flea-market', 23),
('we', 21),
('way', 20)]
As you can see theres hardly any negative word in the top most frequent words. If I use these most frequent to generate my features using NaiveBayes then I dont see any point in the classifier performing any good. Rather I would simply search for words like :-
"dislike","bad", "awful","hate"..
and expect better result than using NaiveBayes on the most_frequent negative words. Is there any better approach than these method?