How to replace random elements of a list with a unique symbol?

Question

I am a newbie to python programming. I have two lists, the first list containing stopwords while the other containing the text document. I want to replace the stop words in the text document with "/". Is there anyone that could help?

I have used the replace function, it was giving an error

text = "This is an example showing off word filtration"
stop = `set`(stopwords.words("english"))
text = nltk.word_tokenize(document)

`for` word in stop:
    text = text.replace(stop, "/")
`print`(text)

It should output "/ / / example showing / word filtration"

Sash Sinha · Accepted Answer · 2019-04-03T12:15:27.443

How about a list comprehension:

>>> from nltk.corpus import stopwords
>>> from nltk.tokenize import word_tokenize  
>>> stop_words = set(stopwords.words('english'))
>>> text = "This is an example showing off word filtration"
>>> text_tokens = word_tokenize(text) 
>>> replaced_text_words = ["/" if word.lower() in stop_words else word for word in text_tokens]
>>> replaced_text_words
['/', '/', '/', 'example', 'showing', '/', 'word', 'filtration']
>>> replaced_sentence = " ".join(replaced_text_words)
>>> replaced_sentence
/ / / example showing / word filtration

Pete · Answer 2 · 2019-04-03T12:38:35.390

1

How about using a regex pattern?

Your code could then look like this:

from nltk.corpus import stopwords
import nltk

text = "This is an example showing off word filtration"
text = text.lower()


import re
pattern = re.compile(r'\b(' + r'|'.join(stopwords.words('english')) + r')\b\s*')
text = pattern.sub('/ ', text)

In relation to this post.

edited Apr 03 '19 at 12:38

answered Apr 03 '19 at 12:18

Pete

100
4
15

1

`This` is a stop word it is just not matching with your regex because it is not completely lowercase i.e. the stopwords corpus only contains lowercase words. – Sash Sinha Apr 03 '19 at 12:33
You are absolutely right. Thanks a lot! I updated the code above. – Pete Apr 03 '19 at 12:38

score 0 · Answer 3 · answered Apr 03 '19 at 12:06

0

you should use word not stop in your replace function.

for word in stop:
    text = text.replace(word, "/")

answered Apr 03 '19 at 12:06

Dharanidhar Reddy

818
6
15

score 0 · Answer 4 · answered Apr 03 '19 at 12:13

0

you can try this

' '/join([item if item.lower() not in stop else "/" for item in text ])

answered Apr 03 '19 at 12:13

Sridhar Murali

380
1
11

How to replace random elements of a list with a unique symbol?

4 Answers4