28

I'm implementing a system that could detect the human emotion in text. Are there any manually annotated data sets available for supervised learning and testing?

Here are some interesting datasets: https://dataturks.com/projects/trending

NooB8374
  • 99
  • 12
ekka
  • 355
  • 1
  • 4
  • 11
  • If you're talking about sentiment detection/opinion mining: yes, there have been a number of shared tasks. I'm pretty sure Google will find some of them. – lenz Jun 08 '15 at 08:49
  • actually not a sentiment.I want a dataset which have categorized the word into different emotions.I tried google but didnt find a good one – ekka Jun 08 '15 at 13:30
  • are you looking for something like, death : feeling(sad) birth : feeling(happy) – Ankit Solanki Jun 09 '15 at 16:35
  • You can check out some interesting datasets here :https://dataturks.com/projects/trending – NooB8374 Jun 07 '18 at 16:56
  • The above dataturks link is not working. Can you please point to updated link? – jkr Feb 10 '21 at 05:35

1 Answers1

49

The field of textual emotion detection is still very new and the literature is fragmented in many different journals of different fields. Its really hard to get a good look on whats out there.

Note that there a several emotion theories psychology. Hence there a different ways of modeling/representing emotions in computing. Most of the times "emotion" refers to a phenomena such as anger, fear or joy. Other theories state that all emotions can be represented in a multi-dimensional space (so there is an infinite number of them).

Here are a some (publicly available) data sets I know of (updated):

  1. EmoBank. 10k sentences annotated with Valence, Arousal and Dominance values (disclosure: I am one of the authors). https://github.com/JULIELab/EmoBank

  2. The "Emotion Intensity in Tweets" data set from the WASSA 2017 shared task. http://saifmohammad.com/WebPages/EmotionIntensity-SharedTask.html

  3. The Valence and Arousal Facebook Posts by Preotiuc-Pietro and others: http://wwbp.org/downloads/public_data/dataset-fb-valence-arousal-anon.csv

  4. The Affect data by Cecilia Ovesdotter Alm: http://people.rc.rit.edu/~coagla/affectdata/index.html

  5. The Emotion in Text data set by CrowdFlower https://www.crowdflower.com/wp-content/uploads/2016/07/text_emotion.csv

  6. ISEAR: http://emotion-research.net/toolbox/toolboxdatabase.2006-10-13.2581092615

  7. Test Corpus of SemEval 2007 (Task on Affective Text) http://web.eecs.umich.edu/~mihalcea/downloads.html

  8. A reannotation of the SemEval Stance data with emotions: http://www.ims.uni-stuttgart.de/data/ssec

If you want to go deeper into the topic, here are some surveys I recommend (disclosure: I authored the first one).

  1. Buechel, S., & Hahn, U. (2016). Emotion Analysis as a Regression Problem — Dimensional Models and Their Implications on Emotion Representation and Metrical Evaluation. In ECAI 2016.22nd European Conference on Artificial Intelligence (pp. 1114–1122). The Hague, Netherlands (available: http://ebooks.iospress.nl/volumearticle/44864).

  2. Canales, L., & Martínez-Barco, P. (n.d.). Emotion Detection from text: A Survey. Processing in the 5th Information Systems Research Working Days (JISIC 2014), 37 (available: http://www.aclweb.org/anthology/W14-6905).

Tom Aranda
  • 5,919
  • 11
  • 35
  • 51
buechel
  • 717
  • 7
  • 18
  • 3
    Another possibly useful resource is Saif Mohammad's [hash tag emotion corpus](http://saifmohammad.com/WebDocs/Jan9-2012-tweets-clean.txt.zip). For more info on the data, see [this page](http://saifmohammad.com/WebPages/lexicons.html) – drevicko Sep 18 '16 at 11:45
  • @beuchel do you know of a resource that I could use to convert your EmoBank valence and arousal labels to standard emotion labels? I imagine that simply using thresholds on the axis values and mapping the various regions to the different emotions should work. I can't seem to find the values that I should use for thresholding, though. – Siddharth Kumar Jun 15 '17 at 18:05
  • 1
    @SiddharthKumar I guess you could really use any machine learning technique to do that. There is actually an experiment about it described in the paper. – buechel Jun 16 '17 at 17:14
  • 1
    @buechel I was planning to do that but thought I'd ask the expert if there are universally agreed upon thresholds for deciding if a certain area in the valence arousal plane represents an emotion. Regarding a classifier that takes in valence/arousal vectors and outputs an emotion, where might I find training data for this simple task? Your repository mentions that a subset of the data is annotated with the standard emotions but I can't seem to find that dataset. Maybe I'm missing something. – Siddharth Kumar Jun 16 '17 at 18:39
  • @SiddharthKumar Im pretty sure that there are no generally agreed upon thresholds. This is whole mapping process a very recent research result after all. The data set I used is the one from SemEval 2007, Task 14. You can look into the paper, in case you need a citation. Here is the link to the data set http://nlp.cs.swarthmore.edu/semeval/tasks/task14/data.shtml – buechel Jun 17 '17 at 07:41
  • 2
    I have few remarks on @buechel 's answer. 1. ISEAR is not available under the referred url anymore. 2. I recommend you to check the paper "An Analysis of Annotated Corpora forEmotion Classification in Text" for additional and updated Emotion detection datasets: https://aclweb.org/anthology/C18-1179 – revy Apr 18 '19 at 15:33