16

I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train samples and the features extracted from train samples and perform classification using Weka classifier, but my predication accuracy is about 70-75%.

Can anybody suggest some other datasets which can help me to increase the result - I have used unigram, bigram and POStags as my features.

Keeth
  • 3
  • 3
user3512562
  • 233
  • 2
  • 3
  • 7

3 Answers3

26

There are many sources to get sentiment analysis dataset:

Anyway, it does not mean it will help you to get a better accuracy for your current dataset because the corpus might be very different from your dataset. Apart from reducing the testing percentage vs training, you could: test other classifiers or fine tune all hyperparameters using semi-automated wrapper like CVParameterSelection or GridSearch, or even auto-weka if it fits.

It is quite rare to use 50/50, 80/20 is quite a commonly occurring ratio. A better practice is to use: 60% for training, 20% for cross validation, 20% for testing.

Abhay Gupta
  • 786
  • 12
  • 30
doxav
  • 978
  • 8
  • 14
  • as you said if i reduce the train %.it will affect the learning process.that means learning from lesser samples will be hard.also if i increase the train % it will cause overfitting...thats why i took 50:5) ratio.. – user3512562 Jul 08 '14 at 08:49
  • 1
    It is quite rare to use 50/50, 80/20 is quite a commonly occurring ratio. A better practice is to use: 60% for training, 20% for cross validation, 20% for testing. PS: I just remember of this huge ngram dataset from google http://storage.googleapis.com/books/ngrams/books/datasetsv2.html – doxav Jul 08 '14 at 09:46
  • 2
    The following contains more than 1,578,627 classified datasets http://thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip or http://ai.stanford.edu/~amaas/data/sentiment/ – Kheshav Sewnundun Nov 08 '15 at 18:39
  • https://www.kaggle.com/bittlingmayer/amazonreviews – Adam Bittlingmayer Mar 12 '19 at 11:58
3

I started to gather sentiment analysis tools/datasets/lexicons in one place, it could be useful for you too: https://github.com/laugustyniak/awesome-sentiment-analysis

Start PR if you want to add something more or just write to me. I worked a lot with Amazon data [millions of reviews].

l.augustyniak
  • 1,794
  • 1
  • 15
  • 15
0

Here is a list of datasets that give the sentiments for individual words.. http://positivewordsresearch.com/sentiment-analysis-resources/

Default picture
  • 710
  • 5
  • 12
  • 2
    While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/low-quality-posts/19850041) – Ted Klein Bergman May 27 '18 at 19:11
  • I will try to put the links here if I get a chance – Default picture May 27 '18 at 19:13