1

im searching for a possibility to stemm strings in java. First I wanted to do it with lucene but all the examples I found in the web were deprecated. (SnowballAnalyzer, PorterStemmer, ...) I just want to stemm whole sentences.

public static String stemSentence(String sentence) {
    ...
    return stemmedSentence;
}

How can i do it?

Mulgard
  • 9,877
  • 34
  • 129
  • 232
  • This link have some solutions for you: http://stackoverflow.com/questions/5391840/stemming-english-words-with-lucene – francisco Jun 07 '14 at 11:17
  • Lucene's stemming analyzers are all language specific, housed in the [org.apache.lucene.analysis](http://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/) package. Pick your language and away you go. (also, while `SnowballAnalyzer` is certainly deprecated, `PorterStemmer` is not. It's used by `EnglishAnalyzer`, after all) – femtoRgon Jun 07 '14 at 15:11

1 Answers1

4

Make this:

public static String stem(String string) throws IOException {
    TokenStream tokenizer = new StandardTokenizer(Version.LUCENE_47, new StringReader(string));
    tokenizer = new StandardFilter(Version.LUCENE_47, tokenizer);
    tokenizer = new LowerCaseFilter(Version.LUCENE_47, tokenizer);
    tokenizer = new PorterStemFilter(tokenizer);

    CharTermAttribute token = tokenizer.getAttribute(CharTermAttribute.class);

    tokenizer.reset();

    StringBuilder stringBuilder = new StringBuilder();

    while(tokenizer.incrementToken()) {
        if(stringBuilder.length() > 0 ) {
            stringBuilder.append(" ");
        }

        stringBuilder.append(token.toString());
    }

    tokenizer.end();
    tokenizer.close();

    return stringBuilder.toString();
}
Mulgard
  • 9,877
  • 34
  • 129
  • 232