Sentiment Classification from own Text Data using NLTK

Question

What I am going to ask may sound very similar to the post Sentiment analysis with NLTK python for sentences using sample data or webservice? , But I am done with Parsing and Tokenization of sentences from text. My question is

Whatever examples till now I have seen in NLTK movie review example seems to be most similar to my problem, But for movie_review the training text is already in a form as it has two folders pos and neg and text are stored there. How can I do that classification for my huge text, Do I read data manually and store them into two folders. Does that make the corpus. After that can I work with them just like movie_review data in example?

2.If the answer to the above question is yes, is there any way to speed up that task by any tool. For example I want to work with only the texts which has "Monty Python" in there content. And then I classify them manually and then store them in pos and neg folder. Does that work?

Please help me

score 3 · Answer 1 · answered May 20 '12 at 23:52

3

Yes, you need a training corpus to train a classifier. Or you need some other way to detect sentiment.

To create a training corpus, you can classify by hand, you can have others classify it for you (mechanical turk is popular for this), or you can do corpus bootstrapping. For sentiment, that could involve creating 2 lists of keywords, positive words and negative words. Using those, you can create an initial training corpus, correct it by hand, then train a classifier. This is an iterative process, and the key thing to remember is "garbage in, garbage out". In other words, if your training corpus is wrong, you can't expect your classifier to be right.

answered May 20 '12 at 23:52

Jacob

4,204
1
25
25

creating two list of keyword means I have to save two lists for positive and negative key words? But I want to tag a document positive or negative. Is that possible? – Hirak Sarkar May 21 '12 at 17:43
the idea with 2 keywords lists is that you can use those to automatically tag the documents, instead of doing it manually – Jacob May 21 '12 at 23:04

Sentiment Classification from own Text Data using NLTK

1 Answers1