0

I have been reading about text classification and found several Java tools which are available for classification, but I am still wondering: Is text classification the same as sentence classification!

Is there any tool which focuses on sentence classification?

jogojapan
  • 68,383
  • 11
  • 101
  • 131
S Gaber
  • 1,536
  • 7
  • 24
  • 43
  • 1
    What about splitting a text into several texts each containing one sentence? Then you could use text classification :) – Thomas Apr 18 '12 at 08:26
  • 1
    owky, this is a good idea. so the same tool for text classification i could use it for sentence classification as well ! – S Gaber Apr 18 '12 at 08:32
  • "Text" is a collective term for anything from a single word to a novel, so long as it consists of words. – mbatchkarov Apr 18 '12 at 09:44
  • @reseter: but the kind of features you use for single words is quite different from the kind you use in document classification. – Fred Foo Apr 18 '12 at 22:39
  • @larsmans: indeed, but the classifiers you put the feature vectors in are all the same. – mbatchkarov Apr 19 '12 at 07:53

1 Answers1

5

Theres no formal difference between 'Text classification' and 'Sentence classification'. After all, a sentence is a type of text. But generally, when people talk about text classification, IMHO they mean larger units of text such as an essay, review or speech. Classifying a politician's speech into democrat or republican is a lot easier than classifying a tweet. When you have a lot of text per instance, you don't need to squeeze each training instance for all the information it can give you and get pretty good performance out a bag-of-words naive-bayes model.

Basically you might not get the required performance numbers if you throw off-the-shelf weka classifiers at a corpora of sentences. You might have to augment the data in the sentence with POS tags, parse trees, word ordering, ngrams, etc. Also get any related metadata such as creation time, creation location, attributes of sentence author, etc. Obviously all of this depends on what exactly are you trying to classify.. the features that will work out for you need to be intuitively meaningful to the problem at hand.

Aditya Mukherji
  • 9,099
  • 5
  • 43
  • 49
  • thanks adi92, is there any detailed tutorial that I can follow to apply this model – S Gaber Apr 19 '12 at 06:19
  • There are two parts to a machine learning task - 1) Finding the right features i.e. a vector of numbers to describe each training instance (in your case - sentence) 2) Training a model using all those feature vectors. My advice to you was wholly about Feature Selection (i.e. point 1), nothing about which model to use. If you don't have any model in mind, Naive Bayes would be a good place to start. Its hard for me recommend you a tutorial without knowing how much ML, math and programming you might already know and what time constraints you are working with. – Aditya Mukherji Apr 19 '12 at 07:06
  • 1
    I just googled around and found this very basic intro to what Naive Bayes means http://bionicspirit.com/blog/2012/02/09/howto-build-naive-bayes-classifier.html – Aditya Mukherji Apr 19 '12 at 07:07
  • This stackoverflow question is a little more involved - http://stackoverflow.com/questions/10059594/a-simple-explanation-of-naive-bayes-classification – Aditya Mukherji Apr 19 '12 at 07:08
  • If you have the time and motivation to understand Naive Bayes in great detail search youtube for 'CS229 Lecture 5' and 'CS229 Lecture 6' and watch those 2 videos. On the opposite side of the spectrum, if you don't care much about the details and just want to see something working you can set up a simple Naive Bayes model on Weka (which I guess you already know about because of the question tag). – Aditya Mukherji Apr 19 '12 at 07:10
  • +1 for this answer, but note that a bag-of-words (multinomial) Naive Bayes may not be appropriate for small units of text. A Bernoulli Naive Bayes may give better results. – Fred Foo Apr 19 '12 at 09:45