6

Stemming is something that's needed in tagging systems. I use delicious, and I don't have time to manage and prune my tags. I'm a bit more careful with my blog, but it isn't perfect. I write software for embedded systems that would be much more functional (helpful to the user) if they included stemming.

For instance:
Parse
Parser
Parsing

Should all mean the same thing to whatever system I'm putting them into.

Ideally there's a BSD licensed stemmer somewhere, but if not, where do I look to learn the common algorithms and techniques for this?

Aside from BSD stemmers, what other open source licensed stemmers are out there?

-Adam

Adam Davis
  • 91,931
  • 60
  • 264
  • 330

4 Answers4

5

Snowball stemmer (C & Java) I've used it's Python binding, PyStemmer

vartec
  • 131,205
  • 36
  • 218
  • 244
5

Check out the nltk toolkit written in python. It has a very functional stemmer.

Anand
  • 7,654
  • 9
  • 46
  • 60
2

Another option for stemming would be WordNet, along with one of its APIs. Some basic information on stemming and lemmatization, including a description of the Porter stemming algorithm, can be found online in Introduction to Information Retrieval.

Fabian Steeg
  • 44,988
  • 7
  • 85
  • 112
1

Lucene has a stemmer in, I believe (and IIRC it lets you use your own one if you want).

EDIT: Just checked, and Lucence refers to the Snowball site which is an open source stemming library as far as I can tell.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194