0

I do not understand why this code does not work. When I click the run, it says "After stopwords removal: None". Can anyone assist how to fix the problem ? Many Thanks.

 stop_words = ["the", "of", "a", "to", "be", "from", "or"]
 last = lower_words.split()

 for i in stop_words:
     lastone = last.remove(i)
     print "\nAAfter stopwords removal:\n",lastone
Behzat
  • 7
  • 1
  • 5

2 Answers2

2

The list.remove() function modifies the list in place and returns None.

So when you do last.remove(i), it will remove the first occurrence of i from the list last and return None, so lastone will always be set to None.

For what you are trying to do, you probably want all occurrences of an item from stop_words removed so last.remove() will not be the most efficient method. Instead, I would do something like the following with a list comprehension:

stop_words = set(["the", "of", "a", "to", "be", "from", "or"])
last = lower_words.split()
last = [word for word in last if word not in stop_words]

Converting stop_words to a set is to make this more efficient, but you would get the same behavior if you left it as a list.

And for completeness, here is how you would need to do this with remove():

stop_words = ["the", "of", "a", "to", "be", "from", "or"]
last = lower_words.split()
for word in stop_words:
    try:
        while True:
            last.remove(word)
    except ValueError:
        pass
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • I have edited my answer with a suggestion of an alternative way to do this, if you still wanted to do with with `remove()` you could but you would need to put the `remove()` call within a loop inside of a try/except block to make sure all occurrences of each word are removed. – Andrew Clark Apr 07 '14 at 22:57
  • Many Thanks F.J.. It works properly now. But, now I need to cancel split function. How can I do that ? I mean I removed stopwords and want to print the list not as an array. – Behzat Apr 07 '14 at 23:01
  • Use `' '.join(last)`, this will return a string with a space added between each element in `last`. – Andrew Clark Apr 07 '14 at 23:42
0

Here is a function that receives a text and returns the text without the stopword. It achieves its goal by ignoring every word in a dictionary stopwords. I use .lower() function for each word i because most of stopwords packages are on lowercase letter but our text may be not.

def cut_stop_words(text,stopwords):
  new_text= ''
  for i in text.split():

    if (i.lower()) in stopwords:
         pass
     else:
         new_text= new_text.strip() + ' ' + i

  return new_text
panomi
  • 21
  • 3