2

Why does the porter stemming algorithm online at

http://text-processing.com/demo/stem/

stem fried to fri and not fry?

I can't recall any words ending with ied past tense in English that have a nominative form ending with i.

Is this a bug?

Nordlöw
  • 11,838
  • 10
  • 52
  • 99

2 Answers2

5

A stem as returned by Porter Stemmer is not necessarily the base form of a verb, or a valid word at all. If you're looking for that, you need to look for a lemmatizer instead.

Daniel Naber
  • 1,594
  • 12
  • 19
  • Great! Could someone give a link to some algorithm (paper or code) describing how the stemmer and lemmatizer are used in conjunction to, for instance, convert `friedness` to `fry`? Does the lemmatizer always operate on the output of the stemming algorithm or does it need both the original and stemmed version of the word? – Nordlöw Dec 26 '14 at 16:52
  • Not always. For example, the lemma of most forms of "be"--"is", "was", "were", "are"--can't be determined from the stem. Stemming is simpler but error prone. In certain applications, though, that's acceptable. A lemmatizer may also use a stemmer as a fall-back. – Dan Dec 27 '14 at 18:50
2

Firstly, a stemmer is not a lemmatizer, see also Stemmers vs Lemmatizers:

>>> from nltk.stem import PorterStemmer, WordNetLemmatizer
>>> porter = PorterStemmer()
>>> wnl = WordNetLemmatizer()
>>> fried = 'fried'
>>> porter.stem(fried)
u'fri'
>>> wnl.lemmatize(fried)
'fried'

Next, a lemmatizer is Part-Of-Speech (POS) sensitive:

>>> wnl.lemmatize(fried, pos='v')
u'fry'
Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738