Incorrect output while using Stanford CoreNLP Sentiment Analysis

Question

When I input the sentence :

"So excited to be back! We're here to reconnect with & meet new innovators at ghc16"

Then the sentiment returned is negative. Can't understand the reason why this is happening. The statement is positive but it still returns negative value.

    class SentimentAnalyzer {

        public TweetWithSentiment findSentiment(String line) {

        if(line == null || line.isEmpty()) {
          throw new IllegalArgumentException("The line must not be null or empty.");
        }

        Annotation annotation = processLine(line);

        int mainSentiment = findMainSentiment(annotation);

        if(mainSentiment < 0 || mainSentiment > 4) { //You should avoid magic numbers like 2 or 4 try to create a constant that will provide a description why 2
           return null; //You should avoid null returns 
        }

        TweetWithSentiment tweetWithSentiment = new TweetWithSentiment(line, toCss(mainSentiment));
        return tweetWithSentiment;

    }

    private String toCss(int sentiment) {
        switch (sentiment) {
        case 0:
            return "very negative";
        case 1:
            return "negative";
        case 2:
            return "neutral";
        case 3:
            return "positive";
        case 4:
            return "very positive";
        default:
            return "default";
        }

     }


     private int findMainSentiment(Annotation annotation) {

        int mainSentiment = Integer.MIN_VALUE;
        int longest = Integer.MIN_VALUE;


        for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {

            for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {

                String word = token.get(CoreAnnotations.TextAnnotation.class);
                String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
                String ne = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);
                String lemma = token.get(CoreAnnotations.LemmaAnnotation.class);

                System.out.println("word: " + word);
                System.out.println("pos: " + pos);
                System.out.println("ne: " + ne);
                System.out.println("Lemmas: " + lemma);

            }      

           int sentenceLength = String.valueOf(sentence).length();

           if(sentenceLength > longest) {

             Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);

             mainSentiment = RNNCoreAnnotations.getPredictedClass(tree);

             longest = sentenceLength ;

            }
        }

        return mainSentiment;

     }


     private Annotation processLine(String line) {

        StanfordCoreNLP pipeline = createPieline();

        return pipeline.process(line);

     }

     private StanfordCoreNLP createPieline() {

        Properties props = createPipelineProperties();

        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        return pipeline;

     }

     private Properties createPipelineProperties() {

        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment");

        return props;

     }


 }

score 1 · Answer 1 · edited May 23 '17 at 12:16

This is another case of technology limitation regarding nlp lib itself, mainly for some specific points:

Ambiguous sentiment words - "This product works terribly" vs. "This product is terribly good"

Missed negations - "I would never in a millions years say that this product is worth buying"

Quoted/Indirect text - "My dad says this product is terrible, but I disagree"

Comparisons - "This product is about as useful as a hole in the head"

Anything subtle - "This product is ugly, slow and uninspiring, but it's the only thing on the market that does the job"

In your example, there's nothing wrong with the algorithm. Let's analyse some parts of the text individually:

So excited to be back! -> positive
We're here to reconnect with - > neutral
meet new innovators at ghc16 -> neutral

In a simple average, we'd have something between neutral and positive. However, the algorithm is not predictable as we've seen, that's why if you add a single word to your text (the & is also not well interpreted):

So excited to be back! We're here to reconnect with you and meet new innovators at ghc16

... it'll return neutral as result.

Suggestions:

Do not consider sentiment 1 as something negative, once you'll face situations like that;
If you can control, make the text as right and concise as possible, in order to get better results;
Divide the sentences as much as you can and run the algorithm individually for each of them. Then, make a custom average based on your own test cases.

If none of them suits, consider switching to another Machine-learning technique.

so does that mean '&' is interpreted as a negative and are there any other symbols which have similar issues? — Xavier, Dec 30 '16 at 06:36
It's not that they're negative, but their presence is confuse and the software may not understand what is means. This confusion makes nlp tends to the negative path (with the absence of **you**, which turns it even more confuse). The point is: if you can, replace special chars. :) — diogo, Jan 01 '17 at 19:25

Incorrect output while using Stanford CoreNLP Sentiment Analysis

1 Answers1