I've tested the 3 Lucene stemmers available from org.apache.lucene.analysis.en
version 4.4.0, which are EnglishMinimalStemFilter
, KStemFilter
and PorterStemFilter
, in a document classification problem I'm working on. My results corroborate the claims made by the authors of Introduction to Information Retrieval that for small training corpora in document classification settings stemming is harmful, and for large corpora stemming makes no difference.
For search and indexing, stemming can be more useful (see, e.g., Jenkins & Smith), but even there the answer to your question depends on the details of what you're doing. There is no free lunch!
At the end of the day, nothing beats empirical tests of real code on real data. The only way you'll really know which is better is by running the stemmers for yourself in your application.