Highest Voted 'lemmatization' Questions

202

votes

14 answers

What is the difference between lemmatization vs stemming?

When do I use each ? Also...is the NLTK lemmatization dependent upon Parts of Speech? Wouldn't it be more accurate if it was?

nlp nltk lemmatization

asked Nov 24 '09 at 00:48

TIMEX

259,804
351
777
1,080

114

votes

22 answers

How do I do word Stemming or Lemmatization?

I've tried PorterStemmer and Snowball but both don't work on all words, missing some very common ones. My test words are: "cats running ran cactus cactuses cacti community communities", and both get less than half right. See also: Stemming…

nlp stemming lemmatization

asked Apr 21 '09 at 10:07

manixrock

2,533
4
24
29

81

votes

4 answers

Stemmers vs Lemmatizers

Natural Language Processing (NLP), especially for English, has evolved into the stage where stemming would become an archaic technology if "perfect" lemmatizers exist. It's because stemmers change the surface form of a word/token into some…

nlp wordnet stemming text-analysis lemmatization

asked Jun 26 '13 at 10:19

alvas

115,346
109
446
738

77

votes

8 answers

wordnet lemmatization and pos tagging in python

I wanted to use wordnet lemmatizer in python and I have learnt that the default pos tag is NOUN and that it does not output the correct lemma for a verb, unless the pos tag is explicitly specified as VERB. My question is what is the best shot…

python nltk wordnet lemmatization

asked Mar 23 '13 at 12:23

user1946217

1,733
6
31
40

39

votes

6 answers

How to use spacy's lemmatizer to get a word into basic form

I am new to spacy and I want to use its lemmatizer function, but I don't know how to use it, like I into strings of word, which will return the string with the basic form the words. Examples: 'words'=> 'word' 'did' => 'do' Thank you.

python nltk spacy lemmatization

asked Aug 04 '16 at 09:04

yi wang

403
1
4
8

31

votes

2 answers

word2vec lemmatization of corpus before training

Word2vec seems to be mostly trained on raw corpus data. However, lemmatization is a standard preprocessing for many semantic similarity tasks. I was wondering if anybody had experience in lemmatizing the corpus before training word2vec and if this…

nlp word2vec gensim lemmatization

asked May 26 '14 at 20:35

Luca Fiaschi

3,145
7
31
44

30

votes

6 answers

How to perform Lemmatization in R?

This question is a possible duplicate of Lemmatizer in R or python (am, are, is -> be?), but I'm adding it again since the previous one was closed saying it was too broad and the only answer it has is not efficient (as it accesses an external…

r nlp lemmatization

asked Jan 29 '15 at 11:55

StrikeR

1,598
5
18
35

29

votes

5 answers

Lemmatize French text

I have some text in French that I need to process in some ways. For that, I need to: First, tokenize the text into words Then lemmatize those words to avoid processing the same root more than once As far as I can see, the wordnet lemmatizer in the…

python nltk lemmatization

asked Oct 29 '12 at 23:27

yelsayed

5,236
3
27
38

23

votes

13 answers

How to turn plural words singular?

I'm preparing some table names for an ORM, and I want to turn plural table names into single entity names. My only problem is finding an algorithm that does it reliably. Here's what I'm doing right now: If a word ends with -ies, I replace the…

algorithm nlp lemmatization

asked Apr 28 '09 at 06:05

Dmitri Nesteruk

23,067
22
97
166

19

votes

2 answers

Is it possible to speed up Wordnet Lemmatizer?

I'm using the Wordnet Lemmatizer via NLTK on the Brown Corpus (to determine if the nouns in it are used more in their singular form or their plural form). i.e. from nltk.stem.wordnet import WordNetLemmatizer l = WordnetLemmatizer() I've noticed…

nltk wordnet lemmatization

asked Apr 24 '13 at 00:30

ess

313
5
12

18

votes

2 answers

Sklearn: adding lemmatizer to CountVectorizer

I added lemmatization to my countvectorizer, as explained on this Sklearn page. from nltk import word_tokenize from nltk.stem import WordNetLemmatizer class LemmaTokenizer(object): def __init__(self): self.wnl =…

python scikit-learn lemmatization countvectorizer

asked Nov 21 '17 at 22:30

Rens

492
1
5
14

15

votes

3 answers

How does spacy lemmatizer works?

For lemmatization spacy has a lists of words: adjectives, adverbs, verbs... and also lists for exceptions: adverbs_irreg... for the regular ones there is a set of rules Let's take as example the word "wider" As it is an adjective the rule for…

python nlp wordnet spacy lemmatization

asked May 05 '17 at 01:50

Luis Ramon Ramirez Rodriguez

9,591
27
102
181

14

votes

3 answers

Multilingual NLTK for POS Tagging and Lemmatizer

Recently I approached to the NLP and I tried to use NLTK and TextBlob for analyzing texts. I would like to develop an app that analyzes reviews made by travelers and so I have to manage a lot of texts written in different languages. I need to do two…

python nlp nltk pos-tagger lemmatization

asked Sep 23 '15 at 13:29

Alessio Schiavelli

161
1
1
6

11

votes

1 answer

WordNetLemmatizer not returning the right lemma unless POS is explicit - Python NLTK

I'm lemmatizing the Ted Dataset Transcript. There's something strange I notice: Not all words are being lemmatized. To say, selected -> select Which is right. However, involved !-> involve and horsing !-> horse unless I explicitly input the 'v'…

python nlp nltk wordnet lemmatization

asked Oct 05 '15 at 21:06

FlyingAura

1,541
5
26
41

11

votes

1 answer

Is there a good stemmer for Hebrew?

I am looking for a good stemmer for Hebrew - I found nothing at all using Google... On the HebMorph site it says that: Stem and Lemma originally have different meanings, but for Semitic languages they seem to be used interchangeably. Does that mean…

nlp hebrew stemming lemmatization

asked Jan 06 '14 at 15:39

Cheshie

2,777
6
32
51

Questions tagged [lemmatization]