2

Just trying to see of a word is English or not. This:

english_words = set(nltk.corpus.words.words())
print("revised" in english_words)

results in False. Am I doing something wrong? Is this to be expected? Are there better ways of doing this? Thanks.

cs0815
  • 16,751
  • 45
  • 136
  • 299

2 Answers2

2

It seems that "revised" indeed is not in the wordlist:

import nltk

english_words = set(nltk.corpus.words.words())

for w in english_words:
    if w.startswith("revise"):
        print(w)

prints the following list:

reviser
revise
revisee
revisership

Based on this source, section 4.1, this is where the word list originates from:

The Words Corpus is the /usr/share/dict/words file from Unix

So you'll have to decide for your use case if the provided word list from NLTK is enough or if you want to switch to a more complete (and bigger) one.

adrianus
  • 3,141
  • 1
  • 22
  • 41
1

Try this

from nltk.corpus import wordnet

if not wordnet.synsets(word_to_test):
  #Not an English Word
else:
  #English Word
Amit Gupta
  • 2,698
  • 4
  • 24
  • 37