0

I am new to NLP and facing some challenges in performing the following task. I want to perform these order of tasks. 1.Sentenence Tokenize 2.Word tokenize on each sentence 3.Lower case 4.Stop word removal 5.Lemmatizing each word

I tried to write a function do the above task

import nltk
import numpy as np
import random
import string
from nltk.corpus import stopwords

def text_processing(input_str):
    tokens = nltk.sent_tokenize(input_str)#sentence tokenizing
    for words in tokens:
        each_word = nltk.word_tokenize(words)#word tokeninzing
        for i in each_word:
            lower_words = i.lower()
            stopwords_removed = [w for w in lower_words if not w in stopwords]
            print(stopwords_removed)

when i call the above function

text_processing(new_doc)

I am getting this error : TypeError: argument of type 'LazyCorpusLoader' is not iterable. How to overcome this error.

Wazaki
  • 899
  • 1
  • 8
  • 22
  • Please avoid the idiom you're using in your code, it iterates through the same sentence several times, see https://stackoverflow.com/questions/26126442/combining-text-stemming-and-removal-of-punctuation-in-nltk-and-scikit-learn/26132560#26132560 – alvas Nov 07 '18 at 15:48

1 Answers1

0

You couldn't use stopwords directly.
Instead, you should download the resources first by typing this in your Jupyter or terminal:

nltk.download()

And than a downloader will show up, choose Corpus -> stopwords, and download.
enter image description here
And then you can use stopwords by this:

my_stopwords = set(stopwords.words('english'))

Reference:
https://www.geeksforgeeks.org/removing-stop-words-nltk-python/
NLTK and Stopwords Fail #lookuperror

kinolollipop
  • 141
  • 1
  • 9