How to remove stopwords at the beginning or the end of a string in Python?

Question

Working using NLTK and I am prototyping a project I have in mind. I come from PHP so Python is a little unknown for me.

I have a list of stopwords and an n-word string, n being between 1 and 4.

I want to clean that string by trimming both ends of any stopwords. If I need to retest the string after I remove a stopword because there might be another one right after it.

How would you do that performance-wise in Python?

what about: http://stackoverflow.com/questions/5486337/how-to-remove-stop-words-using-nltk-or-python — jmunsch, Dec 04 '16 at 10:04

score 1 · Answer 1 · answered Dec 04 '16 at 09:33

1

Tokenize the string into words.

Use set membership operators, which are quick, to eliminate leading/trailing tokens while they match the list of stopwords.

If the next step really needs a string, then concatenate the list of words back into one with the idiomatic ' '.join(your_list)

answered Dec 04 '16 at 09:33

Peteris

3,281
2
25
40

1

Set membership is the clue here. `set.__contains__()` is a constant time operation vs. `list.__contains__()` which is linear time. Also, if your tokens are in a `list`, deleting elements from the front of the list is a linear time operation, so you could get better performance by optimizing how you strip leading stopwords. – Håken Lid Dec 04 '16 at 11:21

How to remove stopwords at the beginning or the end of a string in Python?

1 Answers1