Data sets for emotion detection in text

Question

I'm implementing a system that could detect the human emotion in text. Are there any manually annotated data sets available for supervised learning and testing?

Here are some interesting datasets: https://dataturks.com/projects/trending

If you're talking about sentiment detection/opinion mining: yes, there have been a number of shared tasks. I'm pretty sure Google will find some of them. — lenz, Jun 08 '15 at 08:49
actually not a sentiment.I want a dataset which have categorized the word into different emotions.I tried google but didnt find a good one — ekka, Jun 08 '15 at 13:30
are you looking for something like, death : feeling(sad) birth : feeling(happy) — Ankit Solanki, Jun 09 '15 at 16:35
You can check out some interesting datasets here :https://dataturks.com/projects/trending — NooB8374, Jun 07 '18 at 16:56
The above dataturks link is not working. Can you please point to updated link? — jkr, Feb 10 '21 at 05:35

score 49 · Answer 1 · edited Dec 12 '17 at 15:31

The field of textual emotion detection is still very new and the literature is fragmented in many different journals of different fields. Its really hard to get a good look on whats out there.

Note that there a several emotion theories psychology. Hence there a different ways of modeling/representing emotions in computing. Most of the times "emotion" refers to a phenomena such as anger, fear or joy. Other theories state that all emotions can be represented in a multi-dimensional space (so there is an infinite number of them).

Here are a some (publicly available) data sets I know of (updated):

EmoBank. 10k sentences annotated with Valence, Arousal and Dominance values (disclosure: I am one of the authors). https://github.com/JULIELab/EmoBank
The "Emotion Intensity in Tweets" data set from the WASSA 2017 shared task. http://saifmohammad.com/WebPages/EmotionIntensity-SharedTask.html
The Valence and Arousal Facebook Posts by Preotiuc-Pietro and others: http://wwbp.org/downloads/public_data/dataset-fb-valence-arousal-anon.csv
The Affect data by Cecilia Ovesdotter Alm: http://people.rc.rit.edu/~coagla/affectdata/index.html
The Emotion in Text data set by CrowdFlower https://www.crowdflower.com/wp-content/uploads/2016/07/text_emotion.csv
ISEAR: http://emotion-research.net/toolbox/toolboxdatabase.2006-10-13.2581092615
Test Corpus of SemEval 2007 (Task on Affective Text) http://web.eecs.umich.edu/~mihalcea/downloads.html
A reannotation of the SemEval Stance data with emotions: http://www.ims.uni-stuttgart.de/data/ssec

If you want to go deeper into the topic, here are some surveys I recommend (disclosure: I authored the first one).

Buechel, S., & Hahn, U. (2016). Emotion Analysis as a Regression Problem — Dimensional Models and Their Implications on Emotion Representation and Metrical Evaluation. In ECAI 2016.22nd European Conference on Artificial Intelligence (pp. 1114–1122). The Hague, Netherlands (available: http://ebooks.iospress.nl/volumearticle/44864).
Canales, L., & Martínez-Barco, P. (n.d.). Emotion Detection from text: A Survey. Processing in the 5th Information Systems Research Working Days (JISIC 2014), 37 (available: http://www.aclweb.org/anthology/W14-6905).

Another possibly useful resource is Saif Mohammad's [hash tag emotion corpus](http://saifmohammad.com/WebDocs/Jan9-2012-tweets-clean.txt.zip). For more info on the data, see [this page](http://saifmohammad.com/WebPages/lexicons.html) — drevicko, Sep 18 '16 at 11:45
@beuchel do you know of a resource that I could use to convert your EmoBank valence and arousal labels to standard emotion labels? I imagine that simply using thresholds on the axis values and mapping the various regions to the different emotions should work. I can't seem to find the values that I should use for thresholding, though. — Siddharth Kumar, Jun 15 '17 at 18:05
@SiddharthKumar I guess you could really use any machine learning technique to do that. There is actually an experiment about it described in the paper. — buechel, Jun 16 '17 at 17:14
@buechel I was planning to do that but thought I'd ask the expert if there are universally agreed upon thresholds for deciding if a certain area in the valence arousal plane represents an emotion. Regarding a classifier that takes in valence/arousal vectors and outputs an emotion, where might I find training data for this simple task? Your repository mentions that a subset of the data is annotated with the standard emotions but I can't seem to find that dataset. Maybe I'm missing something. — Siddharth Kumar, Jun 16 '17 at 18:39
@SiddharthKumar Im pretty sure that there are no generally agreed upon thresholds. This is whole mapping process a very recent research result after all. The data set I used is the one from SemEval 2007, Task 14. You can look into the paper, in case you need a citation. Here is the link to the data set http://nlp.cs.swarthmore.edu/semeval/tasks/task14/data.shtml — buechel, Jun 17 '17 at 07:41
I have few remarks on @buechel 's answer. 1. ISEAR is not available under the referred url anymore. 2. I recommend you to check the paper "An Analysis of Annotated Corpora forEmotion Classification in Text" for additional and updated Emotion detection datasets: https://aclweb.org/anthology/C18-1179 — revy, Apr 18 '19 at 15:33

Data sets for emotion detection in text

1 Answers1