removing stopwords after word tokenizing and lower casing

Question

I am new to NLP and facing some challenges in performing the following task. I want to perform these order of tasks. 1.Sentenence Tokenize 2.Word tokenize on each sentence 3.Lower case 4.Stop word removal 5.Lemmatizing each word

I tried to write a function do the above task

import nltk
import numpy as np
import random
import string
from nltk.corpus import stopwords

def text_processing(input_str):
    tokens = nltk.sent_tokenize(input_str)#sentence tokenizing
    for words in tokens:
        each_word = nltk.word_tokenize(words)#word tokeninzing
        for i in each_word:
            lower_words = i.lower()
            stopwords_removed = [w for w in lower_words if not w in stopwords]
            print(stopwords_removed)

when i call the above function

text_processing(new_doc)

I am getting this error : TypeError: argument of type 'LazyCorpusLoader' is not iterable. How to overcome this error.

Please avoid the idiom you're using in your code, it iterates through the same sentence several times, see https://stackoverflow.com/questions/26126442/combining-text-stemming-and-removal-of-punctuation-in-nltk-and-scikit-learn/26132560#26132560 — alvas, Nov 07 '18 at 15:48

score 0 · Answer 1 · answered Nov 05 '18 at 07:45

You couldn't use stopwords directly.
Instead, you should download the resources first by typing this in your Jupyter or terminal:

nltk.download()

And than a downloader will show up, choose Corpus -> stopwords, and download.

And then you can use stopwords by this:

my_stopwords = set(stopwords.words('english'))

Reference:
https://www.geeksforgeeks.org/removing-stop-words-nltk-python/
NLTK and Stopwords Fail #lookuperror

removing stopwords after word tokenizing and lower casing

1 Answers1