Lemmatization in linguistics is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item.
Questions tagged [lemmatization]
436 questions
202
votes
14 answers
What is the difference between lemmatization vs stemming?
When do I use each ?
Also...is the NLTK lemmatization dependent upon Parts of Speech?
Wouldn't it be more accurate if it was?

TIMEX
- 259,804
- 351
- 777
- 1,080
114
votes
22 answers
How do I do word Stemming or Lemmatization?
I've tried PorterStemmer and Snowball but both don't work on all words, missing some very common ones.
My test words are: "cats running ran cactus cactuses cacti community communities", and both get less than half right.
See also:
Stemming…

manixrock
- 2,533
- 4
- 24
- 29
81
votes
4 answers
Stemmers vs Lemmatizers
Natural Language Processing (NLP), especially for English, has evolved into the stage where stemming would become an archaic technology if "perfect" lemmatizers exist. It's because stemmers change the surface form of a word/token into some…

alvas
- 115,346
- 109
- 446
- 738
77
votes
8 answers
wordnet lemmatization and pos tagging in python
I wanted to use wordnet lemmatizer in python and I have learnt that the default pos tag is NOUN and that it does not output the correct lemma for a verb, unless the pos tag is explicitly specified as VERB.
My question is what is the best shot…

user1946217
- 1,733
- 6
- 31
- 40
39
votes
6 answers
How to use spacy's lemmatizer to get a word into basic form
I am new to spacy and I want to use its lemmatizer function, but I don't know how to use it, like I into strings of word, which will return the string with the basic form the words.
Examples:
'words'=> 'word'
'did' => 'do'
Thank you.

yi wang
- 403
- 1
- 4
- 8
31
votes
2 answers
word2vec lemmatization of corpus before training
Word2vec seems to be mostly trained on raw corpus data. However, lemmatization is a standard preprocessing for many semantic similarity tasks. I was wondering if anybody had experience in lemmatizing the corpus before training word2vec and if this…

Luca Fiaschi
- 3,145
- 7
- 31
- 44
30
votes
6 answers
How to perform Lemmatization in R?
This question is a possible duplicate of Lemmatizer in R or python (am, are, is -> be?), but I'm adding it again since the previous one was closed saying it was too broad and the only answer it has is not efficient (as it accesses an external…

StrikeR
- 1,598
- 5
- 18
- 35
29
votes
5 answers
Lemmatize French text
I have some text in French that I need to process in some ways. For that, I need to:
First, tokenize the text into words
Then lemmatize those words to avoid processing the same root more than once
As far as I can see, the wordnet lemmatizer in the…

yelsayed
- 5,236
- 3
- 27
- 38
23
votes
13 answers
How to turn plural words singular?
I'm preparing some table names for an ORM, and I want to turn plural table names into single entity names. My only problem is finding an algorithm that does it reliably. Here's what I'm doing right now:
If a word ends with -ies, I replace the…

Dmitri Nesteruk
- 23,067
- 22
- 97
- 166
19
votes
2 answers
Is it possible to speed up Wordnet Lemmatizer?
I'm using the Wordnet Lemmatizer via NLTK on the Brown Corpus (to determine if the nouns in it are used more in their singular form or their plural form).
i.e. from nltk.stem.wordnet import WordNetLemmatizer
l = WordnetLemmatizer()
I've noticed…

ess
- 313
- 5
- 12
18
votes
2 answers
Sklearn: adding lemmatizer to CountVectorizer
I added lemmatization to my countvectorizer, as explained on this Sklearn page.
from nltk import word_tokenize
from nltk.stem import WordNetLemmatizer
class LemmaTokenizer(object):
def __init__(self):
self.wnl =…

Rens
- 492
- 1
- 5
- 14
15
votes
3 answers
How does spacy lemmatizer works?
For lemmatization spacy has a lists of words: adjectives, adverbs, verbs... and also lists for exceptions: adverbs_irreg... for the regular ones there is a set of rules
Let's take as example the word "wider"
As it is an adjective the rule for…

Luis Ramon Ramirez Rodriguez
- 9,591
- 27
- 102
- 181
14
votes
3 answers
Multilingual NLTK for POS Tagging and Lemmatizer
Recently I approached to the NLP and I tried to use NLTK and TextBlob for analyzing texts. I would like to develop an app that analyzes reviews made by travelers and so I have to manage a lot of texts written in different languages. I need to do two…

Alessio Schiavelli
- 161
- 1
- 1
- 6
11
votes
1 answer
WordNetLemmatizer not returning the right lemma unless POS is explicit - Python NLTK
I'm lemmatizing the Ted Dataset Transcript. There's something strange I notice:
Not all words are being lemmatized. To say,
selected -> select
Which is right.
However, involved !-> involve and horsing !-> horse unless I explicitly input the 'v'…

FlyingAura
- 1,541
- 5
- 26
- 41
11
votes
1 answer
Is there a good stemmer for Hebrew?
I am looking for a good stemmer for Hebrew - I found nothing at all using Google...
On the HebMorph site it says that:
Stem and Lemma originally have different meanings, but for Semitic languages they seem to be used interchangeably.
Does that mean…

Cheshie
- 2,777
- 6
- 32
- 51