0

"Stemming and Lemmatization with Python NLTK for both language as English and Russia"

Source: http://text-processing.com/demo/stem/

I want to use the lib for stemming with Python NLTK for both language as English and Russia.

Could you please give me advice which lib needs to use for this task.

Testudinate
  • 15
  • 1
  • 1
  • 7

1 Answers1

6

For Stemming:

NLTK has Porter Stemmer which is widely used.

For Russian, someone seems to have used Snowball Stemmer.

For Lemmatization:

I prefer SpaCy for lemmatization.

For Russian, someone has been working on this here.

Another lemmatizer for Russian text can be found here.

Ankur Sinha
  • 6,473
  • 7
  • 42
  • 73
  • Can I use for both language ? : Example, SnowballStemmer("russian", "english") – Testudinate May 24 '18 at 13:01
  • I am not sure if you can do it together, may be, may be not, I have never encountered such texts before. However, you can look at this link: https://www.nltk.org/_modules/nltk/stem/snowball.html – Ankur Sinha May 24 '18 at 13:28
  • How can I analysis row with different words (on 2 languages) using "Snowball Stemmer"? I need to define each world ... after that I am switching between 2 parameters "Snowball Stemmer". It is difficult . Maybe someone knows better advice which will help me . – Testudinate May 25 '18 at 07:00
  • Quick question, is the SpaCy lemmatization different from NLTK's wordnet lemmatizer? I think they're both the same port of the morphy() code. – alvas May 25 '18 at 07:20
  • 1
    I am not sure about the intricate details but when I was doing my thesis there was a study about different NLP libraries and SpaCy seemed to be better. – Ankur Sinha May 25 '18 at 07:32
  • Agree (to some extend) but might not be true when it comes to the lemmatizer though =) – alvas May 25 '18 at 08:25