0

I'm working on a C++ Twitter company sentiment analysis tool. User inputs a company and the tool analyzes a # of tweets and returns a sentiment.

So far I did the following:

  1. limit tweets to English and recent
  2. make lowercase
  3. remove RT, # symbol, @usernames and URLs
  4. remove characters like &^%$(){}... etc

I then parse the tweet into words and check words against two dictionaries of positive and negative words. I create a total sentiment for each tweet. Then I count the number of positive , neutral and negative tweets to come up with a final answer. No weights are used.

I am thinking of implementing the following two things:

  1. remove stop words from tweets
  2. remove special characters and emoticons from tweets (non english Unicode basically)

However, even with this, most of the searches end up being very neutral. For example if I search "Apple" in 100 tweets I get say 30 positives, 10 negatives and 60 neutral.

Questions:
1. Is there any way to lower the neutrals?
2. What kind of positive and negative words should I add to represent my search criteria(Companies)

  • You say that you `remove special characters and emoticons from tweets`, why not analyze emoticons too? Happy face = positive sentiment – Keatinge Jun 03 '16 at 20:26
  • If you classify these tweets by hand, do you get a vastly different result? (Scrolling through an "Apple" search, most tweets I see *are* very neutral.) – molbdnilo Jun 03 '16 at 20:32
  • Possible duplicate: http://stackoverflow.com/questions/10416343/how-to-tackle-twitter-sentiment-analysis – Thomas Matthews Jun 03 '16 at 20:33
  • See also: http://stackoverflow.com/questions/4199441/best-algorithmic-approach-to-sentiment-analysis – Thomas Matthews Jun 03 '16 at 20:33
  • @Keatinge regular emoticons would be easy... like: :) or :( or :D . However most tweets contain symbols nowadays. emoji actually. i used emoticons. i should have said emoji. those show up as weird characters in text – Alexandru Lucian Susma Jun 03 '16 at 22:51
  • @molbdnilo you are right. most of them are neutral. So does this mean it is hard to judge based only on tweets ? This was the task i was given... I guess in my report I would mention that while most of them are neutral..we can still see more positives than negatives. Or vice versa. – Alexandru Lucian Susma Jun 03 '16 at 22:52

1 Answers1

1

You say no weighting is used but why not add it. Assign each +/- word a base weight of 1 then maybe apply some of the following conditions:

  1. If they use words like "very", "extremely", etc, weighting the following adjective heavier (or without weighting just count both of them as a +/- word)
  2. Rather than changing everything to lowercase, if there is capslock involved for words weighting those words heavier with a multiplier
  3. Rating words like "fantastic" heavier than words like "good"
Russ
  • 81
  • 1
  • 6
  • 1
    sounds like a good idea but i am not sure i will have time to implement a weight system. I have to present something by the end of the weekend. I made them lowercase to simplify comparisons. Also, I was thinking that repeated letters give different sentiments. like The movie was good. The movie was gooooooooooodddddddddd :) – Alexandru Lucian Susma Jun 03 '16 at 22:54
  • 1
    You could also check if the word was all upper case before converting it to lowercase, and then increase the weight. Like "I absolutely love programming" is a bit less enthusiastic than "I absolutely LOVE programming". – Noctiphobia Jun 04 '16 at 01:15