Highest Voted 'stemming' Questions

114

votes

22 answers

How do I do word Stemming or Lemmatization?

I've tried PorterStemmer and Snowball but both don't work on all words, missing some very common ones. My test words are: "cats running ran cactus cactuses cacti community communities", and both get less than half right. See also: Stemming…

nlp stemming lemmatization

asked Apr 21 '09 at 10:07

manixrock

2,533
4
24
29

81

votes

4 answers

Stemmers vs Lemmatizers

Natural Language Processing (NLP), especially for English, has evolved into the stage where stemming would become an archaic technology if "perfect" lemmatizers exist. It's because stemmers change the surface form of a word/token into some…

nlp wordnet stemming text-analysis lemmatization

asked Jun 26 '13 at 10:19

alvas

115,346
109
446
738

45

votes

7 answers

What is the best stemming method in Python?

I tried all the nltk methods for stemming but it gives me weird results with some words. Examples It often cut end of words when it shouldn't do it : poodle => poodl article articl or doesn't stem very good : easily and easy are not stemmed in…

python nltk stemming

asked Jul 09 '14 at 07:12

PeYoTlL

3,144
2
17
18

36

votes

3 answers

Stemming algorithm that produces real words

I need to take a paragraph of text and extract from it a list of "tags". Most of this is quite straight forward. However I need some help now stemming the resulting word list to avoid duplicates. Example: Community / Communities I've used an…

php nlp stemming snowball porter-stemmer

asked Oct 10 '08 at 10:43

Dave

828
1
13
18

33

votes

3 answers

Java library for keywords extraction from input text

I'm looking for a Java library to extract keywords from a block of text. The process should be as follows: stop word cleaning -> stemming -> searching for keywords based on English linguistics statistical information - meaning if a word appears more…

java nlp extract keyword stemming

asked Jul 03 '13 at 11:43

Shay

497
1
4
10

30

votes

2 answers

Lucene Hebrew analyzer

Does anybody know whether one exists? I've been googling this for monthes... Thanks

lucene hebrew stemming

asked Jun 30 '09 at 14:01

Roey

849
2
11
20

29

votes

7 answers

Stemming English words with Lucene

I'm processing some English texts in a Java application, and I need to stem them. For example, from the text "amenities/amenity" I need to get "amenit". The function looks like: String stemTerm(String term){ ... } I've found the Lucene Analyzer,…

java lucene stemming porter-stemmer

asked Mar 22 '11 at 13:14

Mulone

3,603
9
47
69

20

votes

4 answers

User Warning: Your stop_words may be inconsistent with your preprocessing

I am following this document clustering tutorial. As an input I give a txt file which can be downloaded here. It's a combined file of 3 other txt files divided with a use of \n. After creating a tf-idf matrix I received this warning: ,,UserWarning:…

vectorization text-processing tf-idf stop-words stemming

asked Aug 03 '19 at 16:23

Karolina Andruszkiewicz

459
1
6
18

20

votes

4 answers

Tokenizer, Stop Word Removal, Stemming in Java

I am looking for a class or method that takes a long string of many 100s of words and tokenizes, removes the stop words and stems for use in an IR system. For example: "The big fat cat, said 'your funniest guy i know' to the kangaroo..." the…

java tokenize stemming stop-words

asked Nov 03 '09 at 00:04

Phil

665
5
9
14

20

votes

5 answers

Need a python module for stemming of text documents

I need a good python module for stemming text documents in the pre-processing stage. I found this one http://pypi.python.org/pypi/PyStemmer/1.0.1 but i cannot find the documentation int the link provided. I anyone knows where to find the…

python module preprocessor nlp stemming

asked Apr 29 '12 at 03:11

Kai

953
6
16
37

16

votes

2 answers

Import WordNet In NLTK

I want to import wordnet dictionary but when i import Dictionary form wordnet i see this error : for l in open(WNSEARCHDIR+'/lexnames').readlines(): IOError: [Errno 2] No such file or directory: 'C:\\Program Files\\WordNet\\2.0\\dict/lexnames' I…

python dictionary nltk wordnet stemming

asked Jul 12 '11 at 08:00

Masoud Abasian

10,549
6
23
22

15

votes

2 answers

nltk stemmer: string index out of range

I have a set of pickled text documents which I would like to stem using nltk's PorterStemmer. For reasons specific to my project, I would like to do the stemming inside of a django app view. However, when stemming the documents inside the django…

nlp nltk stemming porter-stemmer

asked Jan 07 '17 at 03:48

jkarimi

1,247
2
15
27

15

votes

3 answers

Converting plural to singular in a text file with Python

I have txt files that look like this: word, 23 Words, 2 test, 1 tests, 4 And I want them to look like this: word, 23 word, 2 test, 1 test, 4 I want to be able to take a txt file in Python and convert plural words to singular. Here's my…

python text stemming plural singular

asked Jul 13 '15 at 15:50

theintern

511
2
6
14

13

votes

1 answer

WordListCorpusReader is not iterable

So, I am new to using Python and NLTK. I have a file called reviews.csv which consists of comments extracted from amazon. I have tokenized the contents of this csv file and written it to a file called csvfile.csv. Here's the code : from…

python csv nltk stemming

asked Oct 28 '17 at 05:26

Aarushi Aiyyar

369
1
5
11

12

votes

4 answers

The reverse process of stemming

I use a lucene snowball analyzer to perform stemming . The results are not meaningful words . I referred this question . One of the solution is to use a database that contains a map between the stemmed version of the word to one stable version of…

java similarity stemming porter-stemmer

asked Feb 28 '12 at 11:30

CTsiddharth

907
12
21

Questions tagged [stemming]