3

I am a newbie to python programming. I have two lists, the first list containing stopwords while the other containing the text document. I want to replace the stop words in the text document with "/". Is there anyone that could help?

I have used the replace function, it was giving an error

text = "This is an example showing off word filtration"
stop = `set`(stopwords.words("english"))
text = nltk.word_tokenize(document)

`for` word in stop:
    text = text.replace(stop, "/")
`print`(text)

It should output "/ / / example showing / word filtration"

thereal90
  • 105
  • 1
  • 8

4 Answers4

1

How about a list comprehension:

>>> from nltk.corpus import stopwords
>>> from nltk.tokenize import word_tokenize  
>>> stop_words = set(stopwords.words('english'))
>>> text = "This is an example showing off word filtration"
>>> text_tokens = word_tokenize(text) 
>>> replaced_text_words = ["/" if word.lower() in stop_words else word for word in text_tokens]
>>> replaced_text_words
['/', '/', '/', 'example', 'showing', '/', 'word', 'filtration']
>>> replaced_sentence = " ".join(replaced_text_words)
>>> replaced_sentence
/ / / example showing / word filtration
Sash Sinha
  • 18,743
  • 3
  • 23
  • 40
1

How about using a regex pattern?

Your code could then look like this:

from nltk.corpus import stopwords
import nltk

text = "This is an example showing off word filtration"
text = text.lower()


import re
pattern = re.compile(r'\b(' + r'|'.join(stopwords.words('english')) + r')\b\s*')
text = pattern.sub('/ ', text)

In relation to this post.

Pete
  • 100
  • 4
  • 15
  • 1
    `This` is a stop word it is just not matching with your regex because it is not completely lowercase i.e. the stopwords corpus only contains lowercase words. – Sash Sinha Apr 03 '19 at 12:33
  • You are absolutely right. Thanks a lot! I updated the code above. – Pete Apr 03 '19 at 12:38
0

you should use word not stop in your replace function.

for word in stop:
    text = text.replace(word, "/")
0

you can try this

' '/join([item if item.lower() not in stop else "/" for item in text ])
Sridhar Murali
  • 380
  • 1
  • 11